week1assignment.utf8

## R Markdown

title: “week1assigment” author: “Ndisha Mwakala” date: “1/11/2020” output: html_document —

library(tidyverse)

## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.3.0 --

## <U+2713> ggplot2 3.2.1     <U+2713> purrr   0.3.3
## <U+2713> tibble  2.1.3     <U+2713> dplyr   0.8.3
## <U+2713> tidyr   1.0.0     <U+2713> stringr 1.4.0
## <U+2713> readr   1.3.1     <U+2713> forcats 0.4.0

## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Week 1 excercises: Chapter 3; 3.3.1

Question1: What’s gone wrong with this code? Why are the points not blue?

ggplot(data = mpg) + 
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

#Answer: This is because the aesthetic setting is being done manually, by providing a color “blue”, which should be done outside of the function aes(). In the code above, it was done inside which makes the code not to work

#Question2: Which variables in mpg are categorical? Which variables are continuous? #(Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg?

#Answer: Categorical variables: manufacturer, trans, drv, fl, class #Continuos variables: displ, year, cyl, cty, hwy, model

#Q2b: How can you see this information when you run mpg? #Answer: Under the dataset header (where variable names are displayed, R provides information about the variable type)

#Question3: Map a continuous variable to color, size, and shape.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = displ))

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = year))

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = model))

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, size = displ))

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, size = year))

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, size = model))

## Warning: Using size for a discrete variable is not advised.

#How do these aesthetics behave differently for categorical vs. continuous variables? #Answer: for continous variables, R creates categories and then plots the values against the nearest category for each value unlike categorical variables which are definite - already specified

#For size, R gives a warning message when a continous variable is used for aescthetics but still plots the values

#Question4: What happens if you map the same variable to multiple aesthetics?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = year, size = year))

#Answer: The charts produced are appropriately displayed with e.g. in the case above the latest year having larger dots and colored as shown on the scale

#Question5: What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, size=trans, stroke=5))

## Warning: Using size for a discrete variable is not advised.

#Answer: The stroke aesthetic modifys the width of the border. It works with shapes that have borders.

#Question6: What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? Note, you’ll also need to specify x and y.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = displ<5))

#Answer: The values are categorised before being plotted e.g. in the case above, the values are categorised into #either less than 5 or greater than five - and colored accordingly

#Week 1 excercises: Chapter 3; 3.5.1

#Question1: What happens if you facet on a continuous variable?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ displ, nrow = 2)

#Answer:The output doesnt make sense because R tries to plot a chart for each value which leads to too many charts

#Question2: What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = drv, y = cyl)) +
facet_grid(drv ~ cyl)

#Answer: The empty cells mean that there are no point to plot for the values of drv & cyl provided e.g. there is 4 #wheel drive with 5 cylinders

#Question3: What plots does the following code make? What does . do?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)

#Answer: This code plots the x~y values, faceted either on the x axis of y axis by the variable provided. The . #completes the formula without having to provide a variable for either x or y, depending on what I want to achieve.

#Question4: Take the first faceted plot in this section:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

#Question4b:What are the advantages to using faceting instead of the colour aesthetic? #Answer: Faceting helps with quick visuals - one can easily see how the data is distributed

#What are the disadvantages? #Answer: It might not be possible to facet some datasets e.g. large datasets How might the balance change if you had a larger dataset? #Answer: It wont be possible to visually understand the data with too many rows and columns.

#Question5: Read ?facet_wrap. What does nrow do? What does ncol do? #Answer: nrow determines number of rows that will be displayed while ncol determine number of columns #What other options control the layout of the individual panels? #Answer: scales, as.table, switch #Why doesn’t ?facet_grid() have nrow and ncol arguments?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)

#Answer: Because facet_grid() forms a matrix of panels as defined by the row and column facting variables therefore #they dont have to be defined as opposed to facet_wrap() which is a sequence of panels

#Questions6: When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?