To get a first feel for ggplot2, let’s try to run some basic ggplot2 commands. Together, they build a plot of the mtcars dataset that contains information about 32 cars from a 1973 Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables. Questions 1-7 are based on the mtcars data frame.

  1. Load the ggplot2 package using the library() command.
library(ggplot2)
  1. Use str() to explore the structure of the mtcars dataset.
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
  1. Execute the following code in the code chunk below. Describe what ggplot is doing with the data.
library(ggplot2)
ggplot(mtcars, aes(x = cyl, y = mpg)) +
  geom_point()

The code created a ggplot of mpg (miles pergallon) vs cyl(number of cylinders). Because the dataset contains values for cars which have, or 8 cylinders only, the plotted values are against only those cyl values.

The plot from #3 isn’t really satisfying. Although cyl (the number of cylinders) is categorical, it is classified as numeric in mtcars. You’ll have to explicitly tell ggplot2 that cyl is a categorical variable.

  1. Change the ggplot() code from #3 by wrapping factor() around cyl.
library(ggplot2)
ggplot(mtcars, aes(x =factor(cyl), y = mpg)) +
  geom_point()

We will use several datasets throughout the class to showcase the concepts discussed in the weekly lectures. In the previous exercises, you already got to know mtcars. Let’s dive a little deeper to explore the three main three layers in the grammar of graphics: data, aesthetics, and geom layers.

The mtcars dataset contains information about 32 cars from 1973 Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.

Think about how the examples and concepts we discuss throughout the grammar of graphics lectures can be applied to your own data-sets!

  1. ggplot2 has already been loaded for you in the code chunk below. Take a look at the first command. It plots the mpg (miles per galon) against the weight (in thousands of pounds). You don’t have to change anything about this command. Run the ggplot code to examine the graph that is produced.
#first ggplot call
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point()

In the second call of ggplot() change the color argument in aes(). The color should be dependent on the displacement of the car engine, found in disp.

#second ggplot call

# Replace ___ with the correct column
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, color = disp)) +
  geom_point()

In the third call of ggplot() change the size argument in aes(). The size should be dependent on the displacement of the car engine, found in disp.

#third ggplot call

# Replace ___ with the correct column
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, size = disp)) +
  geom_point()

  1. After running the above code in the second and third calls to ggplot2 in #5, were the legend for the color and size scales automatically generated? State Yes or No. YES

  2. In the previous exercise you saw that disp can be mapped onto a color gradient or onto a continuous size scale. Another argument of aes() is the shape of the points. There are a finite number of shapes which ggplot() can automatically assign to the points. However, if you try this command in the code chunk below, you will receive an error. Run the code and examine the error that is produced.

library(ggplot2)

The code in the code chunk above gives an error. What does it mean?

  1. shape is not a defined argument

  2. shape only makes sense with categorical data and disp is continuous

  3. shape only makes sense with continuous data and disp is categorical

  4. shape is not a variable in your data frame

Type one and only one letter as your answer to #7. B

Questions 8-15 use the diamonds_sample data frame

The diamonds data frame contains information on the prices and various metrics of 50,000 diamonds. This is a data frame that is built-in when you install the ggplot2 package. Among the variables included are carat (a measurement of the size of the diamond) and price.

You will be working with a subset of this data frame. The name of the data frame you will be using will be diamond_sample. Run the following code chunk to create the diamond_sample data frame. You will use the diamond_sample data frame to answer all questions for this assignment. Do not use the diamonds data frame.

diamonds_sample<-diamonds[sample(1:nrow(diamonds),1000, replace=FALSE),]

Here you will use two common geom layer functions: geom_point() and geom_smooth(). We already discussed in class how these layers are added using the + operator.

  1. Use str() to explore the structure of the diamonds_sample data frame.
str(diamonds_sample)
## Classes 'tbl_df', 'tbl' and 'data.frame':    1000 obs. of  10 variables:
##  $ carat  : num  1.21 0.31 2 1.31 1.05 0.3 0.78 0.28 0.41 1.58 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 2 4 4 4 4 5 2 4 5 4 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 5 2 4 5 2 6 1 3 1 6 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 3 6 2 2 2 7 4 4 3 5 ...
##  $ depth  : num  63.8 62.2 61.8 61.7 60.4 61.1 64 59.6 60.3 61.1 ...
##  $ table  : num  64 56 58 59 58 57 54 61 57 59 ...
##  $ price  : int  5407 1116 16262 4548 4742 552 3298 567 791 10618 ...
##  $ x      : num  6.72 4.31 8.05 7.03 6.59 4.32 5.82 4.28 4.82 7.52 ...
##  $ y      : num  6.63 4.28 7.99 6.98 6.55 4.36 5.86 4.25 4.86 7.44 ...
##  $ z      : num  4.26 2.67 4.96 4.32 3.97 2.65 3.74 2.54 2.92 4.57 ...
  1. Use the + operator to add geom_point() to the ggplot() command. This will tell ggplot2 to draw points on the plot.
# Add geom_point() with +
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price)) + geom_point()

  1. This problem is a continuation of #9. Use the + operator to add geom_point() and geom_smooth() to the ggplot() command. These just stack on each other! geom_smooth() will draw a smoothed line over the points.
# Add geom_point() and geom_smooth() with +
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

  1. In #10, you built a scatter plot of the diamonds_sample dataset, with carat on the x-axis and price on the y-axis. geom_smooth() is used to add a smooth line. Copy and paste the code that created the scatterplot in #10, but show only the smooth line, no points.
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price)) + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

  1. This problem is a continuation of # 11. Show only the smooth line, but color according to clarity by placing the argument color = clarity in the aes() function of your ggplot() call.
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price, color=clarity)) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

  1. This problem is a continuation of #12. You are going to construct a graph with translucent colored points.

Copy the ggplot() command from #12 (with clarity mapped to color). Remove the smooth layer. Add the points layer back in. Set alpha = 0.4 inside geom_point(); this will make the points 40% transparent.

library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price, color=clarity)) + geom_point(alpha = 0.4)

In #14 you are going to explore some of the different grammatical elements of ggplot2. You will start by creating a ggplot object from the diamonds_sample dataset. Next, you will add layers onto this object to build informative graphics.

  1. This problem can be broken into three parts.

    1. Define the data (diamonds_subset) and aesthetics layers. Map the carat on the x-axis and price on the y-axis. Assign it to an object entitled dia_plot.

    2. Using +, add a geom_point() layer (with no arguments), to the dia_plot object. This can be in single or multiple lines.

    3. You can also call aes() within the geom_point() function. Map clarity to the color argument in this way.

library(ggplot2)

#part a
diamonds_subset<-diamonds[sample(1:nrow(diamonds),1000, replace=FALSE),]
library(ggplot2)
dia_plot<-ggplot(diamonds_subset, aes( x = carat, y=price))
                                 
#part b
library(ggplot2)
dia_plot<-ggplot(diamonds_subset, aes( x = carat, y=price)) + geom_point()

#part c
library(ggplot2)
dia_plot<-ggplot(diamonds_subset, aes( x = carat, y=price, color=clarity)) + geom_point()
dia_plot

  1. This problem is a continuation of #14. You have created an object entitled dia_plot. This problem can be broken into three parts.

    1. Update dia_plot so that it contains all the functions to make a scatterplot by using geom_point() for the geom layer. Set alpha=0.2.

    2. Using +, plot the dia_plot object with a geom_smooth() layer on top. You do not want any error shading, which can be achieved by setting the se = FALSE in
      geom_smooth().

    3. Modify the geom_smooth() function from part b so that it contains aes() and map clarity to the col argument.

#part a
#Expand dia_plot by adding geom_point() with alpha set to 0.2. Replace ___ with the appropriate code.

dia_plot <- dia_plot + geom_point(alpha=0.2)

#part b
#Plot dia_plot with additional geom_smooth() with se set to FALSE. Replace ____ with the appropriate code.

dia_plot + geom_smooth(se =FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#part c
#Copy the command from part b and add aes() with the correct mapping to geom_smooth(). Replace ____ with the appropriate code.

dia_plot + geom_smooth(aes(color = clarity), se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'