Using ggplot2

Before we being, we need to load the ‘ggplot2’ and the ‘ggthemes’ libraries. We will be using the mpg data set for this project.

library(ggplot2)
library(ggthemes)

Part A:

Create a bar graph for the “manufacturer” variable. Names of manufacturers should be displayed at a 90 degree angle.

For this question, we use a combination of the functionsggplot(), geom_bar() and theme() to create the bar graph.

# Creating a variable 'parta' to store the graph.
parta <- ggplot(mpg, aes(x = manufacturer)) +
  geom_bar(fill = "blue") + # Adding bars to the empty plot, colored blue.
  theme_gray() + # Applying theme for aesthetic purposes.
  theme(axis.text.x = element_text(angle = 90)) + # Aligning x axis labels vertically.
  labs(title = "Distribution of Manufacturer in the 'mpg' dataset",
       x = "Manufacturer",
       y = "Count") # Adding a title.
  
# Printing graph
parta

Part B:

Create a graph for the ‘year’ variable.

Similar to how we did the previous question, except, since it was not specified, we will not have the x axis labels at an angle. ANother thing to note is that we had to consider the ‘year’ variable as a factor and not an integer, for us to obtain an x axis with discrete values.

# Creating a variable 'partb' to store the graph.
partb <- ggplot(mpg, aes(x = factor(year))) + # Treating the 'year' variable as a factor.
  geom_bar(fill = "blue") + # Adding bars to empty plot, colored blue.
  theme_gray() + # Applying theme for aesthetic reasons.
  labs(title = "Distribution of Year in the 'mpg' dataset",
       x = "Year",
       y = "Count") # Adding a title, and renaming the x and y axes.

# Printing graph
partb

Part C:

Create a density curve for each of the quantitative variables (displ, cty, hwy), conditioning on each type of cylinder. You should overlay the 3 curves for displ, 3 for cty, and 3 for hwy. You should have 3 separate plots, each having 3 curves.

In this question, we use the geom_density() function instead to generate a density curve. We use the ‘displ’, ‘cty’ and ‘hwy’ variables and overlay them with the ‘cyl’ variable, comparing them across the number of cylinders in a car.

# Creating a variable 'partc1' to store the graph.
partc1 <- ggplot(mpg, aes(x = displ, color = factor(cyl))) + # Mapping aesthetic to 'displ', and overlaying it with the 'same'cyl' variable.
  geom_density() + # Adding density curves.
  labs(title = "Density distribution of the variable 'displ'",
       x = "Displ",
       y = "Density", 
       col = "Cyl") + # Adding title, and renaming x and y axes, and also the legend title.
  theme_gray() # Applying a theme.

# Printing the graph.
partc1

We then repeat this for the other two variables.

partc2 <- ggplot(mpg, aes(x = cty, color = factor(cyl))) + # This time mapped for 'cty'.
  geom_density() + 
  labs(title = "Density distribution of the variable 'cty'",
       x = "Cty",
       y = "Density", 
       col = "Cyl") + 
  theme_gray() 

partc2

partc3 <- ggplot(mpg, aes(x = hwy, color = factor(cyl))) + # This time mapped for 'hwy'.
  geom_density() + 
  labs(title = "Density distribution of the variable 'hwy'",
       x = "Hwy",
       y = "Density", 
       col = "Cyl") + 
  theme_gray() 

partc3

Part D:

Create side-by-side boxplots for ‘displ’ by ‘cyl’.

For this questions, we use the geom_boxplot function and the ‘cyl’ formatted as a factor variable.

# Creating the variable 'partd' to store the graph.
partd <- ggplot(mpg, aes(x = factor(cyl), y = displ)) +
  geom_boxplot(fill = "blue") + # Adding boxplot to empty graph, colored blue.
  theme_gray() + # Adding a theme
  labs(main = "Side-by-sided Boxplots for 'displ' by 'cyl'",
       x = "Cyl",
       y = "Displ") # Adding a main title, and renaming the x and y axes.

# Printing the graph.
partd