library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Introduction

<This projects aims to answer the questions contained in Chapters 3 and 5 of the text. I hope to find a relationship in the data mpg and also answer underlying questions that arise from analysis of the data.>

###Exercise 3.2.4

2.How many rows are in mpg? How many columns? ##Answer There are 234rows and 11 columns in mpg.

3.What does the drv variable describe? Read the help for ?mpg to find out. ###Answer The drv variable categorizes the cars into wheel drives. f for front wheel drive, r for right wheel drive and 4 for four wheel drive.

###Exercise 3.3.1 1. 1.What’s gone wrong with this code? Why are the pointsnot blue? ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y =hwy, color = “blue”)) ###Answer The ‘color’ keyword is in the ‘aes’ function and since color is not a variable in mpg, we need to remove color from the ‘aes’ argument by closing the ‘aes’ argument with a bracket and adding color function outside the ‘aes’ argument but still inside the original geom_point argument.

ggplot(data = mpg)+
  geom_point(mapping = aes(x= displ, y= hwy), color= "blue")

2.Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg? ###Answer The categorical variables in mpg are: Manufacturer, model, drv, class, fl, trans The continuous variables are: displ, year,cyl, cty, hwy They can be seen above each column as for categorical and / for continuous variables. | variable name | Cat or Con | |:————–|:——————-| | manufacturer | categorical | | model | character variable | | displ | continuous |

3.Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables? ###Answer

To map cty

ggplot(data= mpg) + 
  geom_point(mapping=aes(x= displ, y = hwy, color = cty ))

Here the colors is represented on a scale that shows from light to dark on a scale of 10-35

to map size=city

ggplot(data= mpg) + 
  geom_point(mapping=aes(x= displ, y = hwy, size = cty ))

Here,city is shown is various sizes of a black dot .

to map shape = city

ggplot(data= mpg) + 
  geom_point(mapping=aes(x= displ, y = hwy, shape = drv ))

This gives an error message because shapes don’t have a numeric value/ order.

4.What happens if you map the same variable to multiple aesthetics? ###Answer E.g map cty toy, size, color

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = cty, color = hwy, size = displ))

This creates a graph that is not legible and clear enough to determine what is going on. It is not an ideal graph to analyze data.

5.What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point) ###Answer Stroke allows to increase or decrease the border size and color of a shape.

6.What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? Note, you’ll also need to specify x and y. ###Answer

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5))

The graph is displawed with ‘displ < 5’ represented with a bool variable True/ False which is color coded with red/ blue color.

###Exercise 3.5.1 1.What happens if you facet on a continuous variable? ###Answer When facet is used on a continuous variable, the variable is converted to categorical variable contained in a facet of each value.

2.What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot? ###Answer

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = hwy, y = cty))+
  facet_grid(drv ~ cyl)

The empty cells represent the areas that have no observations or value in both drv and cyl

3.What plots does the following code make? What does . do?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)

###Answer The . ignores the dimension while faceting. drv~. facet values of drv on the y axis and .~cyl facet values on the x axis.

4.Take the first faceted plot in this section:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset? ###Answer The advantage of using faceting instead of the colour ‘aes’ is that it allows for more categories to be more visible and distinct from the other. A posible disadvantage is that it makes work tasking because of multiple plots.

5.Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol arguments? ###Answer nrow and ncol gives the number or rows and columns to be used when facetting. facet_wrap argument can only facet one variable. facet_grid doesn’t need need nrow and ncol because the number of rows and colums is already specified in the function.

6.When using facet_grid() you should usually put the variable with more unique levels in the columns. Why? ###Answer So that we can have more column spaces for a horizontal plot.

###Exercise 3.6.1 1.What geom would you use to draw a line chart? A boxplot? A histogram? An area chart? ###Answer geom_line() geom_boxplot() geom_histogram() geom_area()

2.Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() + 
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

###Answer This is a scatterplot as I predicted with a line chart for displ and hwy and different colors for drive type.

3.What does show.legend = FALSE do? What happens if you remove it? Why do you think I used it earlier in the chapter? ###Answer show.legend=FALSE removes the gend key bar from the plot. I think you removed it to give more room and attention to the plot.

4.What does the se argument to geom_smooth() do? ###Answer IT gives the standard error and shows where data lies about the average of the data.

5.Will these two graphs look different? Why/why not?

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

###Answer No they wont look different because the mappings are the same even though they are written globally and locally.

6.Recreate the R code necessary to generate the following graphs.

###Answer

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth(se=FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth(se=FALSE, mapping = aes(x=displ, y= hwy, group = drv))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() + 
  geom_smooth(se=FALSE, mapping = aes(x=displ, y= hwy, group = drv))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(aes (x= displ, y = hwy, color = drv)) + 
  geom_smooth(se = FALSE, mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + 
  geom_point(aes (x= displ, y = hwy, color = drv)) + 
  geom_smooth(se = FALSE, mapping = aes(x = displ, y = hwy, linetype = drv))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(aes (x= displ, y = hwy),shape = 21, size = 5, fill = "white", color = "white") + 
  geom_point(aes(x = displ, y = hwy, fill = drv, color= drv), shape=21, size = 2) 

###Exercise 3.7.1 1.What is the default geom associated with stat_summary()? How could you rewrite the previous plot to use that geom function instead of the stat function? ###Answer geom_pointrange. To rewrite, we change stat to summary and change the items to show min, max, midpoint.

2.What does geom_col() do? How is it different to geom_bar()? ###Answer geom_col sets the height of the bars in a bar graph to represent the data values while geom_bar() sets the height of bar to number of observation in each group.

3.Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common? ###Answer

4.What variables does stat_smooth() compute? What parameters control its behaviour? ###Answer it predicts ymin, ymax and standard error se. it controls which method to use in calculating confidence intervals and se.

5.In our proportion bar chart, we need to set group = 1. Why? In other words what is the problem with these two graphs?

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = after_stat(prop)))

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = color, y = after_stat(prop)))

###Answer The prop is not set to 1 hence the geom_bar assumes the groups are equal by default, to the x values.

###Exercise 3.8.1 1.What is the problem with this plot? How could you improve it?

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_point()

###Answer The problem is that some points are overlapping and thus hidden in the graph. we can use geom_jitter() to see all points, thus improving the problem.

2.What parameters to geom_jitter() control the amount of jittering? ###Answer width argument increases the distance between the points.

3.Compare and contrast geom_jitter() with geom_count(). ###Answer geom_jitter makes all point clear and visible by increasing the width distance of points while geom_count()increase the size of points with overlapping values.

4.What’s the default position adjustment for geom_boxplot()? Create a visualisation of the mpg dataset that demonstrates it. ###Answer The default position is dodged.

ggplot(data = mpg, mapping = aes(x=year, y =hwy, color = drv)) + 
  geom_boxplot(position = "dodge")

###Exercise 3.9.1 What is the problem with this plot? 1.Turn a stacked bar chart into a pie chart using coord_polar(). ###Answer

ggplot(data = mpg) + 
  geom_bar(mapping = aes(x = class, fill = drv), width = 1)

ggplot(data = mpg) + 
  geom_bar(mapping = aes(x = class, fill = drv), width = 1) + coord_polar(theta = "y")

2.What does labs() do? Read the documentation. ###Answer labs() helps us to label coordinates

3.What’s the difference between coord_quickmap() and coord_map()? ###Answer coord_quickmap() uses quick approximate values of longitude and latitude while coord_map uses 2D projection and geom transformation.

4.What does the plot below tell you about the relationship between city and highway mpg? Why is coord_fixed() important? What does geom_abline() do?

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() + 
  geom_abline() +
  coord_fixed()

###Answer The relationship is positive correlation between cty and hwy miles per gallon

###Exercise 4.4 1.Why does this code not work?

my_variable <- 10 my_varıable #> Error in eval(expr, envir, enclos): object ‘my_varıable’ not found Look carefully! (This may seem like an exercise in pointlessness, but training your brain to notice even the tiniest difference will pay off when programming.) ###Answer Because my_var1able used an undotted i or a 1 like letter to replace i in my_variable <-10. They are not exactly the same.

2.Tweak each of the following R commands so that they run correctly:

library(tidyverse)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

###Answer dota=data, fliter is filter and diamond is diamonds

library(tidyverse)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

filter(mpg, cyl == 8)
## # A tibble: 70 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a6 quattro   4.2  2008     8 auto… 4        16    23 p     mids…
##  2 chevrolet    c1500 sub…   5.3  2008     8 auto… r        14    20 r     suv  
##  3 chevrolet    c1500 sub…   5.3  2008     8 auto… r        11    15 e     suv  
##  4 chevrolet    c1500 sub…   5.3  2008     8 auto… r        14    20 r     suv  
##  5 chevrolet    c1500 sub…   5.7  1999     8 auto… r        13    17 r     suv  
##  6 chevrolet    c1500 sub…   6    2008     8 auto… r        12    17 r     suv  
##  7 chevrolet    corvette     5.7  1999     8 manu… r        16    26 p     2sea…
##  8 chevrolet    corvette     5.7  1999     8 auto… r        15    23 p     2sea…
##  9 chevrolet    corvette     6.2  2008     8 manu… r        16    26 p     2sea…
## 10 chevrolet    corvette     6.2  2008     8 auto… r        15    25 p     2sea…
## # … with 60 more rows

3.Press Alt + Shift + K. What happens? How can you get to the same place using the menus? ###Answer I saw a keybord short quick reference. ?menus