3.2.4 Exercises

library(tidyverse)
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ------------------------------------------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats

1. Run ggplot(data = mpg) what do you see?

Nothing plotted, but a canvas for a plot is shown.

2. How many rows are in mtcars? How many columns?

dim(mtcars)
[1] 32 11

3. What does the drv variable describe? Read the help for ?mpg to find out.

?mpg
# drv
# f = front-wheel drive, r = rear wheel drive, 4 = 4wd

4. Make a scatterplot of hwy vs cyl.

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))

5. What happens if you make a scatterplot of class vs drv. Why is the plot not useful?

The class and drv attributes are categorical. Therefore the plot shows the mapping between these categories.

ggplot(data = mpg) + geom_point(mapping = aes(x = class, y = drv))

3.3.1 Exercises

1. What’s gone wrong with this code? Why are the points not blue?

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

The points are not blue, because the color layer is specified within the aes mappings. Thus the framework tries to plot the color against an attribute “blue”, but this does not exist within the data. The correct code would be to set the color manually in the geom_point method.

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

2. Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg?

The information is shown directly under the column names: <chr> [characters] are likely to be categorical, whereas <dbl> [double] and <int> [integer] are likely to be continuous.

head(mpg, 1)

3. Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?

For continuous variables a scale is shown, otherwise the category names.

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = cyl, size = hwy, shape = drv))

4. What happens if you map the same variable to multiple aesthetics?

Simply both layers are applied.

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = cyl, size = cyl))

5. What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point)

The stroke aesthetic seems to adjust the plotted object thickness.

6. What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)?

The aesthetic is applied to the evaluated value.

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, colour = displ < 5))

3.5.1 Exercises

1. What happens if you facet on a continuous variable?

There is one facet for each value e.g. a facet on displ.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = cyl, y = hwy)) +
  facet_wrap(~ displ)

2. What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?

The facets are empty when there is no data for the according combination e.g. rear wheel drive (r) with 4 or 5 cylinder is not listed. The 7 cylinder factes are missed entirely.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ cyl)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = drv, y = cyl))

3. What plots does the following code make? What does . do?

The “attribute ~ dot” notation plots the attribute values without a column attribute, thus showing multiple row-wise plots for each attribute value. The y-axis is repeated. With “dot ~ attribute” the row attribute is missing, thus showing column-wise the plots. Then the x-axis is repeated.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)

4. Take the first faceted plot in this section. What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?

With faceting it is easier to examine the indivual classes. With coloring it is easier to see how the classes are clustered overall. With larger datasets it’s more likely that you want to see the overall clustering instead of the individual point clouds.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = class))

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

5. Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol variables?

Facet grids do not have these configuration because the rows and cols are determined by the specified attributes.

?facet_wrap
#nrow, ncol:    Number of rows and columns.
#scales:    should Scales be fixed ("fixed", the default), free ("free"), or free in one dimension ("free_x", "free_y").
#shrink:    If TRUE, will shrink scales to fit output of statistics, not raw data. If FALSE, will be range of raw data before statistical summary.

6. When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?

When putting the more levels on the row axis, then the y-axis would shrink so that it is harder to see which actual values are at the points as shown in the plot.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(class ~ drv)

3.6.1 Exercises

1. What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?

ggplot(data = mpg) +
  geom_line(mapping = aes(x = displ, y = hwy)) +
  geom_point(mapping = aes(x = displ, y = hwy))

2. Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() + 
  geom_smooth(se = FALSE)

Didn’t expected that there would be multiple lines. Maybe because grouped by “color = drv”.

3. What does show.legend = FALSE do? What happens if you remove it? Why do you think I used it earlier in the chapter?

Actually, never used before, but in 3.9 coordinate systems.

4. What does the se argument to geom_smooth() do?

Shows the confidence interval around the line. (the grey area)

5. Will these two graphs look different? Why/why not?

No, because the layers inherit the configuration from ggplot.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()

ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))

6. Recreate the R code necessary to generate the following graphs

Notice: These packages seem to erase the background.

#install.packages("gridExtra")
#install.packages("cowplot")
library(cowplot)

Attaching package: 㤼㸱cowplot㤼㸲

The following object is masked from 㤼㸱package:ggplot2㤼㸲:

    ggsave
p1 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() +
  geom_smooth(se = FALSE)
p2 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() +
  geom_smooth(mapping = aes(group = drv), se = FALSE)
p3 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color=drv)) + 
  geom_point() +
  geom_smooth(se = FALSE)
p4 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color=drv)) +
  geom_smooth(se = FALSE)
p5 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color=drv)) +
  geom_smooth(se = FALSE, mapping = aes(linetype = drv))
p6 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color=drv)) + 
  geom_point(shape = 21, color = "white", stroke = 1)
theme_set(theme_gray())
plot_grid(p1, p2, p3, p4, p5, p6, labels=c("1","2","3", "4","5","6"), ncol=2, nrow = 3)
`geom_smooth()` using method = 'loess'
`geom_smooth()` using method = 'loess'
`geom_smooth()` using method = 'loess'
`geom_smooth()` using method = 'loess'
`geom_smooth()` using method = 'loess'

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color=drv)) +
  geom_point(shape = 21, color = "white", stroke = 2)

---
title: 'R for Data Science: Data visualisation'
output:
  html_notebook: default
  pdf_document: default
---

### 3.2.4 Exercises
```{r}
library(tidyverse)
```


#### 1. Run ggplot(data = mpg) what do you see?

Nothing plotted, but a canvas for a plot is shown.

#### 2. How many rows are in mtcars? How many columns?
```{r}
dim(mtcars)
```

####  3. What does the drv variable describe? Read the help for ?mpg to find out.
```{r}
?mpg
# drv
# f = front-wheel drive, r = rear wheel drive, 4 = 4wd
```

#### 4. Make a scatterplot of hwy vs cyl.
```{r}
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
```

#### 5. What happens if you make a scatterplot of class vs drv. Why is the plot not useful?

The class and drv attributes are categorical. Therefore the plot shows the mapping between these categories.

```{r}
ggplot(data = mpg) + geom_point(mapping = aes(x = class, y = drv))
```

### 3.3.1 Exercises
#### 1. What’s gone wrong with this code? Why are the points not blue?

```{r}
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
```

The points are not blue, because the color layer is specified within the aes mappings. Thus the framework tries to plot the color against an attribute "blue", but this does not exist within the data. The correct code would be to set the color manually in the geom_point method.

```{r}
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
```


#### 2. Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg?

The information is shown directly under the column names: &lt;chr> [characters] are likely to be categorical, whereas &lt;dbl> [double] and &lt;int> [integer] are likely to be continuous.

```{r}
head(mpg, 1)
```

#### 3. Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?

For continuous variables a scale is shown, otherwise the category names.

```{r}
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = cyl, size = hwy, shape = drv))
```


#### 4. What happens if you map the same variable to multiple aesthetics?

Simply both layers are applied.

```{r}
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = cyl, size = cyl))
```

#### 5. What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point)

The stroke aesthetic seems to adjust the plotted object thickness.

#### 6. What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)?

The aesthetic is applied to the evaluated value.

```{r}
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, colour = displ < 5))
```
### 3.5.1 Exercises

#### 1. What happens if you facet on a continuous variable?

There is one facet for each value e.g. a facet on displ.

```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = cyl, y = hwy)) +
  facet_wrap(~ displ)
```


#### 2. What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?

The facets are empty when there is no data for the according combination e.g. rear wheel drive (r) with 4 or 5 cylinder is not listed. The 7 cylinder factes are missed entirely.

```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ cyl)
```


```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = drv, y = cyl))
```


#### 3. What plots does the following code make? What does . do?

The "attribute ~ dot" notation plots the attribute values without a column attribute, thus showing multiple row-wise plots for each attribute value. The y-axis is repeated. With "dot ~ attribute" the row attribute is missing, thus showing column-wise the plots. Then the x-axis is repeated.

```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)
```

```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)
```

#### 4. Take the first faceted plot in this section. What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?

With faceting it is easier to examine the indivual classes. With coloring it is easier to see how the classes are clustered overall. With larger datasets it's more likely that you want to see the overall clustering instead of the individual point clouds.

```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = class))
```


```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)
```

#### 5. Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol variables?

Facet grids do not have these configuration because the rows and cols are determined by the specified attributes.

```{r}
?facet_wrap
#nrow, ncol:	Number of rows and columns.
#scales:	should Scales be fixed ("fixed", the default), free ("free"), or free in one dimension ("free_x", "free_y").
#shrink:	If TRUE, will shrink scales to fit output of statistics, not raw data. If FALSE, will be range of raw data before statistical summary.
```


#### 6. When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?

When putting the more levels on the row axis, then the y-axis would shrink so that it is harder to see which actual values are at the points as shown in the plot.

```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(class ~ drv)
```

### 3.6.1 Exercises

#### 1. What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?

```{r}
ggplot(data = mpg) +
  geom_line(mapping = aes(x = displ, y = hwy)) +
  geom_point(mapping = aes(x = displ, y = hwy))
```


#### 2. Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.

```{r}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() + 
  geom_smooth(se = FALSE)
```

Didn't expected that there would be multiple lines. Maybe because grouped by "color = drv".

#### 3. What does show.legend = FALSE do? What happens if you remove it? Why do you think I used it earlier in the chapter?

Actually, never used before, but in 3.9 coordinate systems.

#### 4. What does the se argument to geom_smooth() do?

Shows the confidence interval around the line. (the grey area)

#### 5. Will these two graphs look different? Why/why not?

No, because the layers inherit the configuration from ggplot.

```{r}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()
```

```{r}
ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
```

#### 6. Recreate the R code necessary to generate the following graphs

Notice: These packages seem to erase the background. 

```{r}
#install.packages("gridExtra")
#install.packages("cowplot")
```

```{r}
library(cowplot)
p1 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() +
  geom_smooth(se = FALSE)

p2 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() +
  geom_smooth(mapping = aes(group = drv), se = FALSE)

p3 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color=drv)) + 
  geom_point() +
  geom_smooth(se = FALSE)

p4 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color=drv)) +
  geom_smooth(se = FALSE)

p5 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color=drv)) +
  geom_smooth(se = FALSE, mapping = aes(linetype = drv))

p6 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color=drv)) + 
  geom_point(shape = 21, color = "white", stroke = 1)

theme_set(theme_gray())
plot_grid(p1, p2, p3, p4, p5, p6, labels=c("1","2","3", "4","5","6"), ncol=2, nrow = 3)
```


```{r}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color=drv)) +
  geom_point(shape = 21, color = "white", stroke = 2)
```

