To get a first feel for ggplot2, let’s try to run some basic ggplot2 commands. Together, they build a plot of the mtcars dataset that contains information about 32 cars from a 1973 Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables. Questions 1-7 are based on the mtcars data frame.
library(ggplot2)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
library(ggplot2)
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_point()
The code created a ggplot of mpg (miles pergallon) vs cyl(number of cylinders). Because the dataset contains values for cars which have, or 8 cylinders only, the plotted values are against only those cyl values.
The plot from #3 isn’t really satisfying. Although cyl (the number of cylinders) is categorical, it is classified as numeric in mtcars. You’ll have to explicitly tell ggplot2 that cyl is a categorical variable.
library(ggplot2)
ggplot(mtcars, aes(x =factor(cyl), y = mpg)) +
geom_point()
We will use several datasets throughout the class to showcase the concepts discussed in the weekly lectures. In the previous exercises, you already got to know mtcars. Let’s dive a little deeper to explore the three main three layers in the grammar of graphics: data, aesthetics, and geom layers.
The mtcars dataset contains information about 32 cars from 1973 Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.
Think about how the examples and concepts we discuss throughout the grammar of graphics lectures can be applied to your own data-sets!
#first ggplot call
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()
In the second call of ggplot() change the color argument in aes(). The color should be dependent on the displacement of the car engine, found in disp.
#second ggplot call
# Replace ___ with the correct column
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, color = disp)) +
geom_point()
In the third call of ggplot() change the size argument in aes(). The size should be dependent on the displacement of the car engine, found in disp.
#third ggplot call
# Replace ___ with the correct column
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, size = disp)) +
geom_point()
After running the above code in the second and third calls to ggplot2 in #5, were the legend for the color and size scales automatically generated? State Yes or No. YES
In the previous exercise you saw that disp can be mapped onto a color gradient or onto a continuous size scale. Another argument of aes() is the shape of the points. There are a finite number of shapes which ggplot() can automatically assign to the points. However, if you try this command in the code chunk below, you will receive an error. Run the code and examine the error that is produced.
library(ggplot2)
The code in the code chunk above gives an error. What does it mean?
shape is not a defined argument
shape only makes sense with categorical data and disp is continuous
shape only makes sense with continuous data and disp is categorical
shape is not a variable in your data frame
Type one and only one letter as your answer to #7. B
The diamonds data frame contains information on the prices and various metrics of 50,000 diamonds. This is a data frame that is built-in when you install the ggplot2 package. Among the variables included are carat (a measurement of the size of the diamond) and price.
You will be working with a subset of this data frame. The name of the data frame you will be using will be diamond_sample. Run the following code chunk to create the diamond_sample data frame. You will use the diamond_sample data frame to answer all questions for this assignment. Do not use the diamonds data frame.
diamonds_sample<-diamonds[sample(1:nrow(diamonds),1000, replace=FALSE),]
Here you will use two common geom layer functions: geom_point() and geom_smooth(). We already discussed in class how these layers are added using the + operator.
str(diamonds_sample)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1000 obs. of 10 variables:
## $ carat : num 1.21 0.31 2 1.31 1.05 0.3 0.78 0.28 0.41 1.58 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 2 4 4 4 4 5 2 4 5 4 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 5 2 4 5 2 6 1 3 1 6 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 3 6 2 2 2 7 4 4 3 5 ...
## $ depth : num 63.8 62.2 61.8 61.7 60.4 61.1 64 59.6 60.3 61.1 ...
## $ table : num 64 56 58 59 58 57 54 61 57 59 ...
## $ price : int 5407 1116 16262 4548 4742 552 3298 567 791 10618 ...
## $ x : num 6.72 4.31 8.05 7.03 6.59 4.32 5.82 4.28 4.82 7.52 ...
## $ y : num 6.63 4.28 7.99 6.98 6.55 4.36 5.86 4.25 4.86 7.44 ...
## $ z : num 4.26 2.67 4.96 4.32 3.97 2.65 3.74 2.54 2.92 4.57 ...
# Add geom_point() with +
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price)) + geom_point()
# Add geom_point() and geom_smooth() with +
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price)) + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price, color=clarity)) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Copy the ggplot() command from #12 (with clarity mapped to color). Remove the smooth layer. Add the points layer back in. Set alpha = 0.4 inside geom_point(); this will make the points 40% transparent.
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price, color=clarity)) + geom_point(alpha = 0.4)
In #14 you are going to explore some of the different grammatical elements of ggplot2. You will start by creating a ggplot object from the diamonds_sample dataset. Next, you will add layers onto this object to build informative graphics.
This problem can be broken into three parts.
Define the data (diamonds_subset) and aesthetics layers. Map the carat on the x-axis and price on the y-axis. Assign it to an object entitled dia_plot.
Using +, add a geom_point() layer (with no arguments), to the dia_plot object. This can be in single or multiple lines.
You can also call aes() within the geom_point() function. Map clarity to the color argument in this way.
library(ggplot2)
#part a
diamonds_subset<-diamonds[sample(1:nrow(diamonds),1000, replace=FALSE),]
library(ggplot2)
dia_plot<-ggplot(diamonds_subset, aes( x = carat, y=price))
#part b
library(ggplot2)
dia_plot<-ggplot(diamonds_subset, aes( x = carat, y=price)) + geom_point()
#part c
library(ggplot2)
dia_plot<-ggplot(diamonds_subset, aes( x = carat, y=price, color=clarity)) + geom_point()
dia_plot
This problem is a continuation of #14. You have created an object entitled dia_plot. This problem can be broken into three parts.
Update dia_plot so that it contains all the functions to make a scatterplot by using geom_point() for the geom layer. Set alpha=0.2.
Using +, plot the dia_plot object with a geom_smooth() layer on top. You do not want any error shading, which can be achieved by setting the se = FALSE in
geom_smooth().
Modify the geom_smooth() function from part b so that it contains aes() and map clarity to the col argument.
#part a
#Expand dia_plot by adding geom_point() with alpha set to 0.2. Replace ___ with the appropriate code.
dia_plot <- dia_plot + geom_point(alpha=0.2)
#part b
#Plot dia_plot with additional geom_smooth() with se set to FALSE. Replace ____ with the appropriate code.
dia_plot + geom_smooth(se =FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#part c
#Copy the command from part b and add aes() with the correct mapping to geom_smooth(). Replace ____ with the appropriate code.
dia_plot + geom_smooth(aes(color = clarity), se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'