Note: These are exercises from Wickham (2016, 2nd ed).

Set up

library(ggplot2)

For viewing the data set, type mpg. To see them a bit more comfortably, use View(mpg) (note the capital V).

mpg

Examples and exercises Part 1

Let’s look at the components for creating a chart with ggplot2, using a scatterplot as the example.

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point()

The pattern shown here is fundamental for gglplot: * data and aesthetic mappings are provided in ggplot(), then * layers are added with +.

A short version of the above is

ggplot(mpg, aes(displ, hwy)) + 
  geom_point()

This produces exactly the same output as the longer version above.

Exercises

The solutions are provided by running the code. But make sure you try first to figure things out for yourself. You can run commands in the Console, or by creating an R Script, typing commands, marking with the mouse what you want to execute, and click ‘’’Run’’’.

  1. How would you plot the relationship between cty, the average city mileage, and hwy, the averabe highway mileage? How would you describe this relationship?
  2. Describe the data, aesthetic mappings, and and layers used for each of the following plots. Sometimes you will have to guess, but common sense should help. Try to imagine what the graph will look like before running the command.
    • ggplot(mpg, aes(cty, hwy)) + geom_point()
    • ggplot(diamonds, aes(carat, price)) + geom_point()
    • ggplot(economics, aes(date, unemploy)) + geom_line()
    • ggplot(mpg, aes(cty)) + geom_histogram()

Solutions

Exercise 1

  1. How would you plot the relationship between cty, the average city mileage, and hwy, the averabe highway mileage? How would you describe this relationship?
ggplot(mpg, aes(cty, hwy)) + 
  geom_point()

The point here to make is that this very linear relationship is of course caused by another factor, motor size. That is to say, the description “the higher/lower cty, the higher/lower hwy” is to taken as strictly descriptive.

Exercise 2

ggplot(mpg, aes(cty, hwy)) + geom_point()
ggplot(diamonds, aes(carat, price)) + geom_point()
ggplot(economics, aes(date, unemploy)) + geom_line()
ggplot(mpg, aes(cty)) + geom_histogram()

Examples and exercises Part 2

Aesthetic attributes

To add variables to a plot, we need to map them onto aesthetics. In two dimensions, we can use the x and the y axis, as shown in the scatterplots above. For adding a third (or fourth, etc.) variable we need to use aesthetics such as shape, color, and size. (In the example class the type of car, such as pickup, drv is the drivetain, such as forward (f), rear (r) or 4-wheel (4) drive, and cyl is the number of cylinders).

As always, try to imagine what the plot in each case will look before you click the Run Current Chunk button.

ggplot(mpg, aes(displ, hwy, colour = class)) + 
  geom_point()
ggplot(mpg, aes(displ, hwy, shape = drv)) + 
  geom_point()
ggplot(mpg, aes(displ, hwy, size = cyl)) + 
  geom_point()

Do you find the scale provided by ggplot useful in these instances? You probably do, because they allow you to translate the aestethics (colour, size, shape) back into values of the variable. Ggplot is also pretty smart about the choice of scales, but of course these defaults can all be overridden.

For setting the colour to a specific value, such as “blue”, the colour needs to be outside of the aes() expression, in a layer (remember, layers are described folling the + sign):

ggplot(mpg, aes(displ, hwy)) + geom_point(colour = "blue")

Question: Ggplot does not provide a scale (a legend) with this graph. Why? Is this a bug? Should you attempt to provide one manually?

Exercises Part 2

From Whickham 2.4.1, page 16. Formulate them in a more closed format, so that students get at least one task that is concrete, before exploring their own combinations - which definitely should be encouraged.

  1. What happens when you map colour to highway mileage (hwy), say, a continuos variable?
  2. What happens when you map shape to hwy (or other continuous variables)? Why?
  3. What happens with you use more than one aesthetic in a plot?
  4. What happens when you map trans (values are “auto”, “manual” etc.) to shape? Why?
  5. How is drive train (drv, with values f, r, 4) related to fuel economy? How is it related to engine size and class?

Solutions for exercises Part 2

Go here – Dalal?

Exercises Part 3

Examples

A boxplot, or box-and-whiskers plot, summarizes a distribution of scores for variable

ggplot(mpg, aes(drv, hwy)) + 
  geom_boxplot()

Histograms and Frequency Polygons show the distribution of scores of a single numeric variable.

ggplot(mpg, aes(hwy)) + 
  geom_histogram()
ggplot(mpg, aes(hwy)) + 
  geom_freqpoly()

Note that the y axis shows counts, that is, frequencies, not values (scores).

A bar chart is the analog of a histogram, but for discrete variables

ggplot(mpg, aes(manufacturer)) + 
  geom_bar()

As a final example, we look at time line plots. Here, the x-axis shows time (e.g, years), and the y axis shows measurements for numeric variables, or counts (frequencies) for categorical data. We use the economics data set, which contains basic economy data for the US, such as unemployment (unemploy) numbers, over years, for the next example.

ggplot(economics, aes(date, unemploy / pop)) +
  geom_line()

Question: How did we overcome the problem that the orginal data set, economics, does not contain values of the unemployment rate directly, but only the number of unemployed people, and the population size?

And to show just one example of overwriting defaults, let’s look at the labels for the axis. While ‘date’ is clear enough, the label ‘unemploy/pop’ label is a bit mysterious. Let’s change this to “unemployment rate”

ggplot(economics, aes(date, unemploy / pop)) +
  geom_line() + 
  ylab("unemployment rate")

Exercises Part 3

Go here

Solutions for Exercises Part 3

Go here.

---
title: 'Module 2 Exercises: Basic graphics with ggplot'
output:
  html_notebook: default
  html_document: default
  pdf_document: default
---
Note: These are exercises from Wickham (2016, 2nd ed).

### Set up

```{r load library}
library(ggplot2)
```
For viewing the data set, type `mpg`. To see them a bit more comfortably, use `View(mpg)` (note the capital V). 

```{r}
mpg
```


## Examples and exercises Part 1
Let's look at the components for creating a chart with ggplot2, using a scatterplot as the example.
```{r}
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point()
```
* mpg is the data set that comes with ggplot
* Aesthetic mappings: engine displacement is mapped to x axis, mileage (hwy) to the y axis
* A layer with points as the geometry is added

The **pattern** shown here is fundamental for gglplot: 
* data and aesthetic mappings are provided in ggplot(), then
* layers are added with +. 

A short version of the above is 
```{r}
ggplot(mpg, aes(displ, hwy)) + 
  geom_point()
```
This produces exactly the same output as the longer version above. 

### Exercises
The solutions are provided by running the code. But make sure you try first to figure things out for yourself. You can run commands in the Console, or by creating an R Script, typing commands, marking with the mouse what you want to execute, and click '''Run'''. 

1. How would you plot the relationship between `cty`, the average city mileage,  and `hwy`, the averabe highway mileage? How would you describe this relationship?
2. Describe the data, aesthetic mappings, and and layers used for each of the following plots. Sometimes you will have to guess, but common sense should help. Try to imagine what the graph will look like before running the command.    
     + ggplot(mpg, aes(cty, hwy)) + geom_point()
     + ggplot(diamonds, aes(carat, price)) + geom_point()
     + ggplot(economics, aes(date, unemploy)) + geom_line()
     + ggplot(mpg, aes(cty)) + geom_histogram()

-----
### Solutions
#### Exercise 1

1. How would you plot the relationship between `cty`, the average city mileage,  and `hwy`, the averabe highway mileage? How would you describe this relationship? 

```{r}
ggplot(mpg, aes(cty, hwy)) + 
  geom_point()
```
The point here to make is that this very linear relationship is of course caused by another factor, motor size. That is to say, the description "the higher/lower cty, the higher/lower hwy" is to taken as strictly descriptive.

#### Exercise 2
```{r}
ggplot(mpg, aes(cty, hwy)) + geom_point()
```

```{r}
ggplot(diamonds, aes(carat, price)) + geom_point()
```

```{r}
ggplot(economics, aes(date, unemploy)) + geom_line()
```

```{r}
ggplot(mpg, aes(cty)) + geom_histogram()
```


## Examples and exercises Part 2

### Aesthetic attributes

To add variables to a plot, we need to map them onto aesthetics. In two dimensions, we can use the x and the y axis, as shown in the scatterplots above. For adding a third (or fourth, etc.) variable we need to use aesthetics such as shape, color, and size. (In the example class the type of car, such as pickup, drv is the drivetain, such as forward (f), rear (r) or 4-wheel (4) drive, and cyl is the number of cylinders).

As always, try to imagine what the plot in each case will look before you click the `Run Current Chunk` button. 

```{r}
ggplot(mpg, aes(displ, hwy, colour = class)) + 
  geom_point()
ggplot(mpg, aes(displ, hwy, shape = drv)) + 
  geom_point()
ggplot(mpg, aes(displ, hwy, size = cyl)) + 
  geom_point()
```
Do you find the scale provided by ggplot useful in these instances? You probably do, because they allow you to translate the aestethics (colour, size, shape) back into values of the variable. Ggplot is also pretty smart about the choice of scales, but of course these defaults can all be overridden. 

For setting the colour to a specific value, such as "blue", the colour needs to be outside of the `aes()` expression, in a layer (remember, layers are described folling the `+` sign): 

```{r}
ggplot(mpg, aes(displ, hwy)) + geom_point(colour = "blue")
```
**Question:** Ggplot does not provide a scale (a legend)  with this graph. Why? Is this a bug? Should you attempt to provide one manually? 

### Exercises Part 2

From Whickham 2.4.1, page 16. Formulate them in a more closed format, so that students get at least one task that is concrete, before exploring their own combinations - which definitely should be encouraged. 

1. What happens when you map colour to highway mileage (hwy), say, a continuos variable? 
2. What happens when you map shape to hwy (or other continuous variables)? Why?
3. What happens with you use more than one aesthetic in a plot? 
3. What happens when you map trans (values are "auto", "manual" etc.) to shape? Why? 
4. How is drive train (drv, with values f, r, 4) related to fuel economy? How is it related to engine size and class? 

### Solutions for exercises Part 2

Go here -- Dalal?

## Exercises Part 3

### Examples

A **boxplot**, or box-and-whiskers plot, summarizes a distribution of scores for variable

```{r}
ggplot(mpg, aes(drv, hwy)) + 
  geom_boxplot()
```

**Histograms** and **Frequency Polygons** show the distribution of scores of a single numeric variable.

```{r}
ggplot(mpg, aes(hwy)) + 
  geom_histogram()
ggplot(mpg, aes(hwy)) + 
  geom_freqpoly()
```

Note that the y axis shows *counts*, that is, frequencies,  *not values* (scores). 

A **bar chart** is the analog of a histogram, but for discrete variables

```{r}
ggplot(mpg, aes(manufacturer)) + 
  geom_bar()
```
As a final example, we look at **time line** plots. Here, the x-axis shows time (e.g, years), and the y axis shows measurements for numeric variables,  or counts (frequencies) for categorical data. We use the economics data set, which contains basic economy data for the US, such as unemployment (`unemploy`) numbers, over years, for the next example.  

```{r}
ggplot(economics, aes(date, unemploy / pop)) +
  geom_line()
```
**Question:** How did we overcome the problem that the orginal data set, `economics`, does not contain values of the unemployment *rate* directly, but only the number of unemployed people, and the population size? 

And to show just one example of overwriting defaults, let's look at the labels for the axis. While 'date' is clear enough, the label 'unemploy/pop' label is a bit mysterious. Let's change this to "unemployment rate"

```{r}
ggplot(economics, aes(date, unemploy / pop)) +
  geom_line() + 
  ylab("unemployment rate")
```

### Exercises Part 3

Go here

### Solutions for Exercises Part 3

Go here. 
