Base Package and ggplot2 Part 1

To gain a better appreciation of ggplot2 and to understand how it operates differently from base package, it’s useful to make some comparisons. In the class lecture, you already saw one example of how to make a (poor) multivariate plot in base package. In this series of exercises you’ll take a look at a better way using the equivalent version in ggplot2.

First, let’s focus on base package. You want to make a plot of mpg (miles per gallon) against wt (weight in thousands of pounds) in the mtcars data frame, but this time you want the dots colored according to the number of cylinders, cyl. How would you do that in base package? You can use a little trick to color the dots by specifying a factor variable as a color. This works because factors are just a special class of the integer type.

Base Package and ggplot2 Exercises Part 1

  1. Using the base package plot(), make a scatter plot with mtcars\(wt on the x-axis and mtcars\)mpg on the y-axis, colored according to mtcars$cyl (use the col argument). You can specify data = but you’ll just do it the long way here.
#Plot the correct variables of mtcars
#Replace ___ with the correct code

plot(mtcars$wt, mtcars$mpg, col = mtcars$cyl)

  1. Add a new column, fcyl, to the mtcars data frame. This should be cyl converted to a factor.
#Change cyl inside mtcars to a factor
#Replace ___ with the correct code

mtcars$fcyl <- as.factor(mtcars$cyl)
  1. Create a similar plot to #1, but this time, use fcyl (which is cyl as a factor) to set the col.
plot(mtcars$wt, mtcars$mpg, col = mtcars$fcyl)

Base Package and ggplot2 Part 2

If you want to add a linear model to your plot, shown right, you can define it with lm() and then plot the resulting linear model with abline(). However, if you want a model for each subgroup, according to cylinders, then you have a couple of options.

You can subset your data, and then calculate the lm() and plot each subset separately. Alternatively, you can vectorize over the cyl variable using lapply() and combine this all in one step. This option is already prepared for you in problem #6.

The code in problem #4 below contains a call to the function lapply(), which you might not have seen before. This function takes as input a vector and a function. Then lapply() applies the function it was given to each element of the vector and returns the results in a list. In this case, lapply() takes each element of mtcars\(cyl and calls the function defined in the second argument. This function takes a value of mtcars\)cyl and then subsets the data so that only rows with cyl == x are used. Then it fits a linear model to the filtered dataset and uses that model to add a line to the plot with the abline() function.

Base Package and ggplot2 Exercises Part 2

  1. Fill in the lm() function to calculate a linear model of mpg described by wt and save it as an object called carModel.
# Use lm() to calculate a linear model and save it as carModel

carModel <- lm(mpg~wt, data = mtcars)
  1. Draw the linear model on the scatterplot.Write code that calls abline() with carModel as the first argument. Set the line type by passing the argument lty = 2.
#run this code along with the abline function you create below; run code all together at once
# Basic plot
mtcars$cyl <- as.factor(mtcars$cyl)
plot(mtcars$wt, mtcars$mpg, col = mtcars$cyl)

# Call abline() with carModel as first argument and set lty to 2
# Replace ___ with the correct code
abline(carModel, lty = 2)

  1. Run the code already given to generate the plot with a different model for each group. You don’t need to modify any of this. Study and understand what this code is producing. A short description of the code is below.

The code below contains a call to the function lapply(), which you might not have seen before. This function takes as input a vector and a function. Then lapply() applies the function it was given to each element of the vector and returns the results in a list. In this case, lapply() takes each element of mtcars\(cyl and calls the function defined in the second argument. This function takes a value of mtcars\)cyl and then subsets the data so that only rows with cyl == x are used. Then it fits a linear model to the filtered dataset and uses that model to add a line to the plot with the abline() function.

# Plot each subset efficiently with lapply
# You don't have to edit this code
plot(mtcars$wt, mtcars$mpg, col = mtcars$cyl)
lapply(mtcars$cyl, function(x) {
  abline(lm(mpg ~ wt, mtcars, subset = (cyl == x)), col = x)
  })

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL
## 
## [[9]]
## NULL
## 
## [[10]]
## NULL
## 
## [[11]]
## NULL
## 
## [[12]]
## NULL
## 
## [[13]]
## NULL
## 
## [[14]]
## NULL
## 
## [[15]]
## NULL
## 
## [[16]]
## NULL
## 
## [[17]]
## NULL
## 
## [[18]]
## NULL
## 
## [[19]]
## NULL
## 
## [[20]]
## NULL
## 
## [[21]]
## NULL
## 
## [[22]]
## NULL
## 
## [[23]]
## NULL
## 
## [[24]]
## NULL
## 
## [[25]]
## NULL
## 
## [[26]]
## NULL
## 
## [[27]]
## NULL
## 
## [[28]]
## NULL
## 
## [[29]]
## NULL
## 
## [[30]]
## NULL
## 
## [[31]]
## NULL
## 
## [[32]]
## NULL
  1. Now that you have an interesting plot, there is a very important aspect missing - the legend!

Run the code already given to generate the plot with a different model for each group and includes a legend. This is the same code from problem #6 with the code for the legend added.You do not need to modify any of this code.

# Plot each subset efficiently with lapply (code from #6)
plot(mtcars$wt, mtcars$mpg, col = mtcars$cyl)
lapply(mtcars$cyl, function(x) {
  abline(lm(mpg ~ wt, mtcars, subset = (cyl == x)), col = x)
  })
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL
## 
## [[9]]
## NULL
## 
## [[10]]
## NULL
## 
## [[11]]
## NULL
## 
## [[12]]
## NULL
## 
## [[13]]
## NULL
## 
## [[14]]
## NULL
## 
## [[15]]
## NULL
## 
## [[16]]
## NULL
## 
## [[17]]
## NULL
## 
## [[18]]
## NULL
## 
## [[19]]
## NULL
## 
## [[20]]
## NULL
## 
## [[21]]
## NULL
## 
## [[22]]
## NULL
## 
## [[23]]
## NULL
## 
## [[24]]
## NULL
## 
## [[25]]
## NULL
## 
## [[26]]
## NULL
## 
## [[27]]
## NULL
## 
## [[28]]
## NULL
## 
## [[29]]
## NULL
## 
## [[30]]
## NULL
## 
## [[31]]
## NULL
## 
## [[32]]
## NULL
#adding a legend to the graph
legend(x = 5, y = 33, legend = levels(mtcars$cyl),
       col = 1:3, pch = 1, bty = "n")

Base Package and ggplot2 Part 3

In these exercises you will recreate the base package plot in ggplot2. The code for base R plotting is given in problem #8. The first line of code already converts the cyl variable of mtcars to a factor.

  1. Plotting in base R. This is the code from Part 2. Study the code.
# Convert cyl to factor (do not need to edit code)
mtcars$cyl <- as.factor(mtcars$cyl)

# Example from base R (do not need to edit code)
plot(mtcars$wt, mtcars$mpg, col = mtcars$cyl)
abline(lm(mpg ~ wt, data = mtcars), lty = 2)
lapply(mtcars$cyl, function(x) {
  abline(lm(mpg ~ wt, mtcars, subset = (cyl == x)), col = x)
  })
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL
## 
## [[9]]
## NULL
## 
## [[10]]
## NULL
## 
## [[11]]
## NULL
## 
## [[12]]
## NULL
## 
## [[13]]
## NULL
## 
## [[14]]
## NULL
## 
## [[15]]
## NULL
## 
## [[16]]
## NULL
## 
## [[17]]
## NULL
## 
## [[18]]
## NULL
## 
## [[19]]
## NULL
## 
## [[20]]
## NULL
## 
## [[21]]
## NULL
## 
## [[22]]
## NULL
## 
## [[23]]
## NULL
## 
## [[24]]
## NULL
## 
## [[25]]
## NULL
## 
## [[26]]
## NULL
## 
## [[27]]
## NULL
## 
## [[28]]
## NULL
## 
## [[29]]
## NULL
## 
## [[30]]
## NULL
## 
## [[31]]
## NULL
## 
## [[32]]
## NULL
#adding a legend to the graph
legend(x = 5, y = 33, legend = levels(mtcars$cyl),
       col = 1:3, pch = 1, bty = "n")

  1. Plot 1: Add geom_point() in order to make a scatter plot
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
# Plot 1: add geom_point() to this command to create a scatter plot
ggplot(mtcars, aes(x = wt, y = mpg, col = cyl)) + geom_point()

  1. Plot 2: copy and paste Plot 1 and perform the following
  1. Add a linear model for each subset according to cyl by adding a geom_smooth() layer.

  2. Inside this geom_smooth(), set method to “lm” and se to FALSE.

Note: geom_smooth() will automatically draw a line per cyl subset. It recognizes the groups you want to identify by color in the aes() call within the ggplot() command.

library(ggplot2)
# Plot 2: include the lines of the linear models, per cyl
ggplot(mtcars, aes(x = wt, y = mpg, col = cyl)) +
  geom_point() + #Copy from Plot 1
  
  geom_smooth(method = 'lm', se = F) #Fill in using instructions for Plot 2

  1. Plot 3: copy and paste Plot 2 and perform the following
  1. Plot a linear model for the entire dataset, do this by adding another geom_smooth() layer

  2. Set the group aesthetic inside this geom_smooth() layer to 1. This has to be set within the aes() function.

  3. Set method to “lm”, se to FALSE and linetype to 2. These have to be set outside aes() of the geom_smooth().

Note: the group aesthetic will tell ggplot() to draw a single linear model through all the points.

library(ggplot2)

# Plot 3: include a lm for the entire dataset in its whole
ggplot(mtcars, aes(x = wt, y = mpg, col = cyl)) + 
  geom_point() + # Copy from Plot 2
  geom_smooth(method = 'lm', se = F) + # Copy from Plot 2
  geom_smooth(aes(group = 1), method = 'lm', se = F, linetype = 2)   # Fill in using instructions Plot 3