The objectives of this problem set is to gain experience working with the ggplot2 package for data visualization. To do this we have been provided with a series of graphics, all created using the ggplot2 package. Our objective for this assignment will be writing the code necessary to exactly recreate the provided graphics.
This graphic is a traditional stacked bar chart. This graphic works on the mpg dataset, which is built into the ggplot2 library. This means that we can access it simply by ggplot(mpg, ....). There is one modification above in this default graphic. We renamed the legend for more clarity.
ggplot(mpg, aes(x = class, fill = trans)) +
geom_bar( ) +
guides(fill = guide_legend(title ="Transmission"))
This boxplot is also built using the mpg dataset. Notice the changes in the axis labels, and an altered theme_XXXX.
ggplot(mpg, aes(manufacturer, cty)) +
geom_boxplot() +
coord_flip() +
labs(y = "Highway Fuel Efficiency (miles/ gallon)",
x = "Vehicle Manufacturer") +
theme_bw() +
ggthemes::geom_rangeframe() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.line = element_line())
This graphic is built with another dataset diamonds, a dataset also built into the ggplot2 package. For this one we used an additional package called ggthemes. We used it to reproduce this view.
ggplot(diamonds, aes(price, fill = cut)) +
geom_density(alpha = 0.2) +
labs(x = "Diamond Price (USD)",
y = "Density",
title = "Diamond Price Density") +
ggthemes::theme_economist()
For this plot, we are changing vis idioms to a scatter plot framework. Additionally, we are using ggplot2 package to fit a linear model to the data all within the plot framework. There are edited labels and theme modifications as well.
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
geom_smooth(method = 'lm') +
labs(x = "Iris Sepal Length",
y = "Iris Petal Length",
title = "Relation between Petal Length and Sepal Length") +
theme_bw() +
theme(panel.border = element_blank())
Finally, in this vis we extend on the last example, by plotting the same data but using an additional channel to communicate species level differences. Again, we fit a linear model to the data but this time one for each species, and add additional theme and labeling modicitations.
ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) +
labs(x = "Iris Sepal Length",
y = "Iris Petal Length",
title = "Relation between Petal Length and Sepal Length",
subtitle = "Species Level Comparison") +
geom_point() +
geom_smooth(method = 'lm',
se = FALSE) +
theme_bw() +
theme(legend.position = "bottom",
panel.border = element_blank(),
panel.grid = element_blank())