The objectives of this problem set is to gain experience working with the ggplot2 package for data visualization. To do this I have provided a series of graphics, all created using the ggplot2 package. Your objective for this assignment will be write the code necessary to exactly recreate the provided graphics.
When completed submit a link to your file on rpubs.com. Be sure to include echo = TRUE for each graphic so that I can see the visualization and the code required to create it.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
This graphic is a traditional stacked bar chart. This graphic works on the mpg dataset, which is built into the ggplot2 library. This means that you can access it simply by ggplot(mpg, ….). There is one modification above default in this graphic, I renamed the legend for more clarity.
data(mpg)
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of 11 variables:
## $ manufacturer: chr "audi" "audi" "audi" "audi" ...
## $ model : chr "a4" "a4" "a4" "a4" ...
## $ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr "f" "f" "f" "f" ...
## $ cty : int 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr "p" "p" "p" "p" ...
## $ class : chr "compact" "compact" "compact" "compact" ...
mpg %>%
group_by(class) %>%
select(trans) %>%
summarise(
count = n()
)
## Adding missing grouping variables: `class`
## # A tibble: 7 x 2
## class count
## <chr> <int>
## 1 2seater 5
## 2 compact 47
## 3 midsize 41
## 4 minivan 11
## 5 pickup 33
## 6 subcompact 35
## 7 suv 62
ggplot() +
geom_bar(data = mpg,aes(x = class, fill=factor(trans))) +
geom_text(stat='count', aes(label=..count..), vjust=-1) +
scale_fill_discrete(name="Transmission")
## reference: https://stackoverflow.com/questions/26553526/how-to-add-frequency-count-labels-to-the-bars-in-a-bar-graph-using-ggplot2
This boxplot is also built using the mpg dataset. Notice the changes in axis labels, and an altered theme_XXXX (Reference used: https://rstudio-pubs-static.s3.amazonaws.com/3364_d1a578f521174152b46b19d0c83cbe7e.html)
ggplot() +
geom_boxplot(data = mpg,aes(y=hwy, x = manufacturer)) +
coord_flip() +
labs( y = "Highway Fuel Efficiency (miles/gallon)", x = "Vehicle Manufacturer")
This graphic is built with another dataset diamonds a dataset also built into the ggplot2 package. For this one I used an additional package called library(ggthemes) check it out to reproduce this view.
#install.packages('ggthemes', dependencies = TRUE)
library(ggthemes)
data(diamonds)
str(diamonds)
## Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
# Line plot with multiple groups: http://www.sthda.com/english/wiki/ggplot2-line-plot-quick-start-guide-r-software-and-data-visualization
## density plot: http://ggplot2.tidyverse.org/reference/geom_density.html
ggplot(diamonds,
aes(price, col=cut, fill = cut)
) +
geom_density(
alpha = 0.2) + ## alpha needs to be within the geom_density; if it's within ggplot(), the transparity doesn't show
theme_economist() +
labs(title = "Diamond Price Density",
x = "Diamond Price (USD)",
y = "Density")
data("iris")
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
ggplot(iris, aes(Sepal.Length, Petal.Length)) +
geom_point() +
geom_smooth(method = "lm") +
ggtitle("Relationship between Petal and Sepal Length") +
labs(y = "Iris Petal Length", x = "Iris Sepal Length")
ggplot(iris, aes(Sepal.Length, Petal.Length, colour = Species)) +
geom_point() +
geom_smooth(se = FALSE, method = "lm") +
ggtitle("Relationship between Petal and Sepal Length") +
labs(y = "Iris Petal Length", x = "Iris Sepal Length", subtitle = "Species level comparison") +
theme(legend.position="bottom", text=element_text(family="Times New Roman"))