.html file as: YourName_ANLY512-0-2018.html and upload it to the “Visualization Coding Exercise #2” assignment on Moodle. This assignment is worth 30 points. Questions 1 - 10 are worth 1 point each. Question 11 through Question 15 is are 4 points.To get a first feel for ggplot2, let’s try to run some basic ggplot2 commands. Together, they build a plot of the mtcars dataset that contains information about 32 cars from a 1973 Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables. Questions 1-7 are based on the mtcars data frame.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
library(ggplot2)
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_point()
The plot from #3 isn’t really satisfying. Although cyl (the number of cylinders) is categorical, it is classified as numeric in mtcars. You’ll have to explicitly tell ggplot2 that cyl is a categorical variable.
library(ggplot2)
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_point()
We will use several datasets throughout the class to showcase the concepts discussed in the weekly lectures. In the previous exercises, you already got to know mtcars. Let’s dive a little deeper to explore the three main three layers in the grammar of graphics: data, aesthetics, and geom layers.
The mtcars dataset contains information about 32 cars from 1973 Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.
Think about how the examples and concepts we discuss throughout the grammar of graphics lectures can be applied to your own data-sets!
#first ggplot call
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()
In the second call of ggplot() change the color argument in aes(). The color should be dependent on the displacement of the car engine, found in disp.
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, color = disp)) +
geom_point()
In the third call of ggplot() change the size argument in aes(). The size should be dependent on the displacement of the car engine, found in disp.
#third ggplot call
# Replace ___ with the correct column
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, size = disp)) +
geom_point()
Yes, automatically generated with color and size attributes in ggplot2
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()
The code in the code chunk above gives an error. What does it mean?
shape is not a defined argument
shape only makes sense with categorical data and disp is continuous
shape only makes sense with continuous data and disp is categorical
shape is not a variable in your data frame
Type one and only one letter as your answer to #7.
Answer is b
The diamonds data frame contains information on the prices and various metrics of 50,000 diamonds. This is a data frame that is built-in when you install the ggplot2 package. Among the variables included are carat (a measurement of the size of the diamond) and price.
You will be working with a subset of this data frame. The name of the data frame you will be using will be diamond_sample. Run the following code chunk to create the diamond_sample data frame. You will use the diamond_sample data frame to answer all questions for this assignment. Do not use the diamonds data frame.
diamonds_sample<-diamonds[sample(1:nrow(diamonds),1000, replace=FALSE),]
Here you will use two common geom layer functions: geom_point() and geom_smooth(). We already discussed in class how these layers are added using the + operator.
str(diamonds_sample)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1000 obs. of 10 variables:
## $ carat : num 0.67 0.62 1.2 1.59 0.7 0.31 0.57 1.14 0.34 1.51 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 2 3 5 5 2 5 3 4 3 5 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 7 5 6 2 2 5 3 3 6 4 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 5 4 2 2 3 4 2 3 6 3 ...
## $ depth : num 64.2 61.8 60.5 62.3 63.6 61.5 63.5 62.5 61.3 61.6 ...
## $ table : num 55.6 57 58 55 62 55.2 56 59 57 56 ...
## $ price : int 1581 1734 5050 11251 2386 513 1397 5228 596 8283 ...
## $ x : num 5.54 5.47 6.92 7.52 5.57 4.36 5.28 6.67 4.49 7.34 ...
## $ y : num 5.57 5.5 6.83 7.48 5.6 4.39 5.25 6.65 4.52 7.25 ...
## $ z : num 3.57 3.39 4.16 4.67 3.55 2.7 3.34 4.16 2.76 4.5 ...
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price))+geom_point()
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price))+geom_point()+ geom_smooth()
## `geom_smooth()` using method = 'gam'
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price))+ geom_smooth()
## `geom_smooth()` using method = 'gam'
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price, color= clarity))+geom_smooth()
## `geom_smooth()` using method = 'loess'
Copy the ggplot() command from #12 (with clarity mapped to color). Remove the smooth layer. Add the points layer back in. Set alpha = 0.4 inside geom_point(); this will make the points 40% transparent.
library(ggplot2)
ggplot(diamonds_sample, aes(x = carat, y = price, color= clarity))+geom_point(alpha= 0.4)
In #14 you are going to explore some of the different grammatical elements of ggplot2. You will start by creating a ggplot object from the diamonds_sample dataset. Next, you will add layers onto this object to build informative graphics.
This problem can be broken into three parts.
Define the data (diamonds_subset) and aesthetics layers. Map the carat on the x-axis and price on the y-axis. Assign it to an object entitled dia_plot.
Using +, add a geom_point() layer (with no arguments), to the dia_plot object. This can be in single or multiple lines.
You can also call aes() within the geom_point() function. Map clarity to the color argument in this way.
library(ggplot2)
#part a
dia_plot <- ggplot(diamonds_sample, aes(x = carat, y = price))
dia_plot
#part b
dia_plot <- ggplot(diamonds_sample, aes(x = carat, y = price)) + geom_point()
dia_plot
#part c
dia_plot <- ggplot(diamonds_sample, aes(x = carat, y = price)) + geom_point(aes(color = clarity))
dia_plot
This problem is a continuation of #14. You have created an object entitled dia_plot. This problem can be broken into three parts.
Update dia_plot so that it contains all the functions to make a scatterplot by using geom_point() for the geom layer. Set alpha=0.2.
Using +, plot the dia_plot object with a geom_smooth() layer on top. You do not want any error shading, which can be achieved by setting the se = FALSE in
geom_smooth().
Modify the geom_smooth() function from part b so that it contains aes() and map clarity to the col argument.
#part a
dia_plot <- ggplot(diamonds_sample, aes(x = carat, y = price))
dia_plot <- dia_plot + geom_point(alpha=0.2)
#part b
dia_plot + geom_smooth(se = F)
## `geom_smooth()` using method = 'gam'
#part c
dia_plot + geom_smooth(aes(col=clarity), se = F)
## `geom_smooth()` using method = 'loess'