This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Objectives The objectives of this problem set is to gain experience working with the ggplot2 package for data visualization. To do this I have provided a series of graphics, all created using the ggplot2 package. Your objective for this assignment will be write the code necessary to exactly recreate the provided graphics.
When completed submit a link to your file on rpubs.com. Be sure to include echo = TRUE for each graphic so that I can see the visualization and the code required to create it.
Vis 1 This graphic is a traditional stacked bar chart. This graphic works on the mpg dataset, which is built into the ggplot2 library. This means that you can access it simply by ggplot(mpg, ….). There is one modification above default in this graphic, I renamed the legend for more clarity.
require(datasets) ## Load base R datasets. This program uses mpg, diamonds, and iris
require(ggplot2) ## Load ggplot2 package to draw below plots
## Loading required package: ggplot2
require(ggthemes) ## Load ggthemes package to use extra themes for Vis 3 (theme_economist()) and Vis 5 (theme_pander())
## Loading required package: ggthemes
mpg.plot <- ggplot(mpg) ## Create plot for mpg dataset
mpg.plot +
geom_bar(aes(class, ## Use bar chart. X-asix is class
fill = trans)) + ## Use stacked bar chart. Legend is trans (transmission)
scale_fill_discrete(name = "Transmission") ## Assign legend title
Vis 2 This boxplot is also built using the mpg dataset. Notice the changes in axis labels, and an altered theme_XXXX
mpg.plot + ## Plot for mpg dataset
geom_boxplot(aes(manufacturer, hwy)) +
## Use box plot to find distribution of Highway Fuel Efficiency by Manufacturer
theme_classic() + ## Use the classic theme
coord_flip() + ## Flip coordinate
labs(y = "Highway Fuel Efficiency (mile/gallon)",
## Assign y-axis (horizontal) title
x = "Vehicle Manufacturer") ## Assign x-axis (vertical) title
Vis 3 This graphic is built with another dataset diamonds a dataset also built into the ggplot2 package. For this one I used an additional package called library(ggthemes) check it out to reproduce this view.
ggplot(diamonds) + ## Create plot for diamonds dataset
geom_density(aes(price, ## Find density of diamond price
fill = cut, ## Legend is cut (quality of the cut). Use fill colors to differentiate
color = cut), ## Legend is cut (quality of the cut). Use stroke colors to differentiate too
alpha = 0.3, ## Set transparency level of fill color
size = 0.6) + ## Set width of strokes
labs(title = "Diamond Price Density",
## Assign plot title
x = "Diamond Price (USD)", ## Assign x-axis title
y = "Density") + ## Assign y-axis title
theme_economist() ## Use the theme used by Economist magazine
Vis 4 For this plot we are changing vis idioms to a scatter plot framework. Additionally, I am using ggplot2 package to fit a linear model to the data all within the plot framework. Three are edited labels and theme modifications as well.
ggplot(iris, ## Create plot for iris dataset
aes(Sepal.Length, Petal.Length)) +
## X-axis is sepal length; y-axis is patel length
geom_point() + ## Use scatter plot
geom_smooth(method = lm) + ## Draw regression line
theme_minimal() + ## Use the "minimal" theme
theme(panel.grid.major = element_line(size = 1),
## Set width of major grid line
panel.grid.minor = element_line(size = 0.7)) +
## Set width of minor grid line
labs(title = "Relationship between Petal and Sepal Length",
## Assign plot title
x = "Iris Sepal Length", ## Assign x-axis title
y = "Iris Petal Length") ## Assign y-axis title
Vis 5 Finally, in this vis I extend on the last example, by plotting the same data but using an additional channel to communicate species level differences. Again I fit a linear model to the data but this time one for each species, and add additional theme and labeling modicitations.
ggplot(iris, ## Create plot for iris dataset
aes(Sepal.Length, ## Set x-axis: sepal length
Petal.Length, ## Set y-axis: petal length
color = Species)) + ## Set legend: species. Use colors to differentiate.
geom_point() + ## Use scatter plot
geom_smooth(method = lm, se = FALSE) +
## Draw regression line without confidence region
theme_pander() + ## Use the Pander theme
theme(text = element_text(family = "serif"),
## Use Times New Roman for all texts
axis.ticks = element_line(color = "black",
## Set color of tick marks to be black
size = 0.7),
## Set size of tick marks
legend.position = "bottom", ## Move legend to the bottom of plot
legend.title = element_text(face = "plain"),
## Legend title was italic. Set to plain.
plot.title = element_text(size = 14,
## Set font size of plot title
face = "plain")) +
## Set font of plot title to be plain
labs(title = "Relationship between Petal and Sepal Length",
## Assign plot title
subtitle = "Species level comparison",
## Assign plot subtitle
x = "Iris Sepal Length", ## Assign x-axis title
y = "Iris Petal Length") ## Assign y-axis title