In this homework, you will use diamonds data set from ggplot2 package. It is automatically loaded when you execute library(ggplot2) so you don’t have to separately load it. Please create a duplicate of diamonds and use that for homework. This will avoid corrupting the original data set.

Make sure that you understand the variables in the data by checking the help file by executing help(diamonds) in the console. The data has five variables that measure the dimensions.

Instructions

  1. You are expected to recreate each plot exactly as shown in this homework.
  2. The objective of this homework is to help you develop fine grain control over ggplot2. As such, please refrain from exercising artistic freedom!
  3. All the plots use theme_minimal(). You can set this as your default theme by adding this line in the setup chunk after you load ggplot2 library: theme_set(theme_minimal())

Q1 (3 points)

Recreate the following graph. The parameter that controls transparency is set at 0.3. You need not get exactly the same colors but they must be discrete and should not follow a color gradient.

ggplot(d1, aes(carat, price)) +
  geom_point(aes(color = as.character(clarity), fill = as.character(clarity) ),
             color = "black", shape = 21, size = 1.5, alpha = 0.3) +
  scale_y_continuous(labels = scales::dollar_format(prefix = "$")) +
  labs(title = "Scatterplot of Diamond Prices",
       x = "Diamond Carats",
       y = "Diamond Price") +
  scale_color_discrete()

Q2 (2 points)

The previous graph looks cluttered. So you decided to use facets instead. Recreate the following graph:

ggplot(d1, aes(carat, price, color = as.character(clarity))) +
  geom_point(aes(color = as.character(clarity), fill = as.character(clarity) ),
             color = "black", shape = 21, size = 1.5, alpha = 0.3,
             show.legend = FALSE) +
  scale_y_continuous(labels = scales::dollar_format(prefix = "$")) +
  labs(title = "Scatterplot of Diamond Prices",
       x = "Diamond Carats",
       y = "Diamond Price") +
  facet_wrap(~clarity, nrow = 2) +
  guides(color = "none")

Q3 (5 points)

Next, you want to know whether the price of diamonds depends on table and depth. Note the line types. Recreate the following graphs:

ggplot(d1, aes(table, price)) +
  geom_smooth(method = "lm", color = "red", linetype = "dashed", size = 1) +
  xlim(0,100) +
  labs(x = "Table", y = "Price")

ggplot(d1, aes(depth, price)) +
  geom_smooth(method = "lm", color = "white", linetype = "dotdash", size = 1) +
  scale_x_continuous(limits=c(0, 80), breaks=seq(0, 80, 10)) +
  labs(x = "Depth", y = "Price")

Q4 (5 points)

Recreate each of the following graphs for data exploration:

ggplot(d1, aes(x*y*z, price, color = cut)) +
  geom_point(alpha = 0.8) +
  scale_color_manual(values = c("#8856A7", "#9ebcda", "#636363", "#dd1c77", "#d95f0e"))

ggplot(d1, aes(price)) +
  geom_histogram(bins = 75,
                 color = "white") +
  scale_x_continuous(labels = scales::dollar_format(prefix = "$")) +
  scale_y_continuous(labels = scales::comma_format()) 

ggplot(d1, aes(clarity)) +
  geom_bar(color = "red",
           show.legend = FALSE,
           fill = c("#dd1c77","#c994c7","#e7e1ef","#d95f0e","#fec44f","#bcbddc","#756bb1","#efedf5")) 

ggplot(d1, aes(cut, depth)) +
  geom_violin(color = "blue",
              alpha = 0.5) +
  geom_jitter(color = "red",
              alpha = 0.07,
              show.legend = FALSE) 

ggplot(d1, aes(x, price)) +
   geom_smooth(se = FALSE) +
  geom_smooth(method = "lm",
              color = "green")