The objectives of this problem set is to gain experience working with the ggplot2
package for data visualization.
This graphic is a traditional stacked bar chart. This graphic works on the mpg
dataset, which is built into the ggplot2
library. This means that you can access it simply by ggplot(mpg, ....)
. There is one modification above default in this graphic, I renamed the legend for more clarity.
library(datasets) #Load default datasets
library(ggplot2) #Load ggplot2 package
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
ggplot(mpg)+ #Create plot for mpg dataset
geom_bar(aes(x=class,fill=trans))+ #Plot Bar Chart, set x-axis=class, set legend = trans
scale_fill_discrete(name="Transmission")#Name the tegend Transmission
This boxplot is also built using the mpg
dataset. Notice the changes in axis labels, and an altered theme_XXXX
ggplot(mpg)+
geom_boxplot(aes(manufacturer,hwy)) + #boxplot of fuel-efficiency by Manufact
coord_flip()+ #Flip to horizontal view
labs(y = "Highway Fuel Efficiency (mile/gallon)", x="Vehicle Manufacturer")+ # Assign x and y-labels
theme_classic() #Use classic theme
This graphic is built with another dataset diamonds
a dataset also built into the ggplot2
package. For this one I used an additional package called library(ggthemes)
check it out to reproduce this view.
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.6.1
ggplot(diamonds)+
geom_density(aes(price, #Density plot of diamond price
fill=cut, #Set legend = cut. Using fill colors to differentiate
color=cut), #Using stroke colors to differentiate
alpha=0.2, #Set trasparency level of fill colors
size=0.6)+ #Set width of strokes
labs(title = "Diamond Price Density",x="Diamond Price (USD)",y="Density")+ #Add title and X and Y Labels
theme_economist() #Use theme 'Economist'
For this plot we are changing vis idioms to a scatter plot framework. Additionally, I am using ggplot2
package to fit a linear model to the data all within the plot framework. Three are edited labels and theme modifications as well.
ggplot(iris,
aes(Sepal.Length,Petal.Length))+ # set x-axis as Sepal.Length; y-axis as Petal.Length
geom_point()+ # Scatterplot
geom_smooth(method=lm)+ # Add regression line to scatterplot
labs(title="Relationship between Petal and Sepal Length",x="Iris Sepal Length",y="Iris Petal Length")+ # Label title, x and y labels
theme_minimal() # Use theme 'minimal'
Finally, in this vis I extend on the last example, by plotting the same data but using an additional channel to communicate species level differences. Again I fit a linear model to the data but this time one for each species, and add additional theme and labeling modicitations.
ggplot(iris,
aes(Sepal.Length, #Set x-axis as Sepal Length
Petal.Length, #Set y-axis as Petal Length
color=Species))+ #Set legend = Species; Use colors to differentiate Species
geom_point()+ # Scatterplot
geom_smooth(method=lm,se=FALSE)+ # Draw regression lin w/o confidence region
labs(title="Relationship between Petal and Sepal Length",
subtitle = "Species level comparison",
x="Iris Sepal Length",
y="Iris Petal Length")+ # Label title, subtitle, x and y labels
theme_tufte()+ # Use theme 'pander'
theme(legend.position="bottom") # Position legend at the bottom