Solve or answer the following exercises in Section 8-15 on pages 133 - 135:

Exercises 2 - 7 & 14

Setup

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(dslabs)
data(heights)
data(murders)
p <- murders %>% ggplot()
class(p)

## [1] "gg"     "ggplot"

8.15.2.

Remember that to print an object you can use the command print or simply type the object. For example

x <- 2
x
print(x)

Print the object p defined in exercise one and describe what you see.

A. Nothing happens.

B. A blank slate plot.

C. A scatter plot.

D. A histogram.

A: B. A blank slate plot.

print(p)

8.15.3

Using the pipe %>%, create an object p but this time associated with the heights dataset instead of the murders dataset.

p <- heights %>% ggplot()

8.15.4

What is the class of the object p you have just created?

class(p)

## [1] "gg"     "ggplot"

8.15.5

Now we are going to add a layers and the corresponding aesthetic mappings. For the murders data we plotted total murders versus population sizes. Explore the murders data frame to remind yourself what are the names for these two variables and select the correct answer.

Hint: Look at ?murders.

A. state and abb.

B. total_murders and population_size.

C. total and population.

D. murders and size.

A: C. total and population.

?murders

## starting httpd help server ... done

8.15.6

To create the scatter plot we add a layer with geom_point. The aesthetic mappings require us to define the x-axis and y-axis variables respectively. So the code looks like this:

murders %>% ggplot(aes(x = , y = )) +
    geom_point()

except we have to define the two variables x and y. Fill this out with the correct variable names.

murders %>% ggplot(aes(x = population  , y = total)) +
    geom_point()

8.15.7

Note that if we don’t use argument names, we can obtain the same plot by making sure we enter the variable names in the right order like this:

murders %>% ggplot(aes(population, total)) + 
    geom_point()

Remake the plot but now with total in the x-axis and population in the y-axis.

murders %>% ggplot(aes(total, population)) + 
    geom_point()

8.15.14

Now we are going to change the x-axis to a log scale to account for the fact the distribution of population is skewed. Let’s start by define an object p holding the plot we have made up to now:

p <- murders %>%
    ggplot(aes(population, total, label = abb, color = region)) +
geom_label()

To change the y-axis to a log scale we learned about the scale_x_log10() function. Add this layer to the object p to change the scale and render the plot

p <- murders %>%
    ggplot(aes(population/10^6, total, label = abb, color = region)) +
geom_label()

p + scale_x_log10() + 
    scale_y_log10() +
    xlab("Population in Millions (log scale)") +
    ylab("Total Murders (log scale) ") +
    ggtitle("US gun murders by state for 2010")

Assignment: ggplot2

Joshua Farina