Objectives

The objectives of this problem set is to gain experience working with the ggplot2 package for data visualization. To do this I have provided a series of graphics, all created using the ggplot2 package. Your objective for this assignment will be write the code necessary to exactly recreate the provided graphics.

When completed submit a link to your file on rpubs.com. Be sure to include echo = TRUE for each graphic so that I can see the visualization and the code required to create it.

Vis 1. This graphic is a traditional stacked bar chart. This graphic works on the mpg dataset, which is built into the ggplot2 library. This means that you can access it simply by ggplot(mpg, ….). There is one modification above default in this graphic, I renamed the legend for more clarity.

library(ggplot2)

data("mpg")
head(mpg)
## # A tibble: 6 x 11
##   manufacturer model displ  year   cyl      trans   drv   cty   hwy    fl
##          <chr> <chr> <dbl> <int> <int>      <chr> <chr> <int> <int> <chr>
## 1         audi    a4   1.8  1999     4   auto(l5)     f    18    29     p
## 2         audi    a4   1.8  1999     4 manual(m5)     f    21    29     p
## 3         audi    a4   2.0  2008     4 manual(m6)     f    20    31     p
## 4         audi    a4   2.0  2008     4   auto(av)     f    21    30     p
## 5         audi    a4   2.8  1999     6   auto(l5)     f    16    26     p
## 6         audi    a4   2.8  1999     6 manual(m5)     f    18    26     p
## # ... with 1 more variables: class <chr>
summary(mpg)
##  manufacturer          model               displ            year     
##  Length:234         Length:234         Min.   :1.600   Min.   :1999  
##  Class :character   Class :character   1st Qu.:2.400   1st Qu.:1999  
##  Mode  :character   Mode  :character   Median :3.300   Median :2004  
##                                        Mean   :3.472   Mean   :2004  
##                                        3rd Qu.:4.600   3rd Qu.:2008  
##                                        Max.   :7.000   Max.   :2008  
##       cyl           trans               drv                 cty       
##  Min.   :4.000   Length:234         Length:234         Min.   : 9.00  
##  1st Qu.:4.000   Class :character   Class :character   1st Qu.:14.00  
##  Median :6.000   Mode  :character   Mode  :character   Median :17.00  
##  Mean   :5.889                                         Mean   :16.86  
##  3rd Qu.:8.000                                         3rd Qu.:19.00  
##  Max.   :8.000                                         Max.   :35.00  
##       hwy             fl               class          
##  Min.   :12.00   Length:234         Length:234        
##  1st Qu.:18.00   Class :character   Class :character  
##  Median :24.00   Mode  :character   Mode  :character  
##  Mean   :23.44                                        
##  3rd Qu.:27.00                                        
##  Max.   :44.00
ggplot(mpg,aes(class, fill = trans)) + geom_bar() + scale_fill_discrete(name = "Transmission")

qplot(data = mpg,class, fill = trans)+ scale_fill_discrete(name = "Transmission")

Vis 2. This boxplot is also built using the mpg dataset. Notice the changes in axis labels, and an altered theme_XXXX

qplot(manufacturer, hwy, data=mpg, geom = "boxplot",xlab = "Vehicle Manufacturer", ylab = "Highway Fuel Efficiency (miles/gallon)")+ 
coord_flip()+
theme(panel.background = element_rect(fill = "white"))

Vis 3. This graphic is built with another dataset diamonds a dataset also built into the ggplot2 package. For this one I used an additional package called library(ggthemes) check it out to reproduce this view.

library(ggthemes)

data("diamonds")
head(diamonds)
## # A tibble: 6 x 10
##   carat       cut color clarity depth table price     x     y     z
##   <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
## 2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
## 3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
## 4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
## 5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48
summary(diamonds)
##      carat               cut        color        clarity     
##  Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
##  1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
##  Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
##  Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
##  3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
##  Max.   :5.0100                     I: 5422   VVS1   : 3655  
##                                     J: 2808   (Other): 2531  
##      depth           table           price             x         
##  Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
##  1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
##  Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
##  Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
##  3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
##  Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  
##                                                                  
##        y                z         
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 4.720   1st Qu.: 2.910  
##  Median : 5.710   Median : 3.530  
##  Mean   : 5.735   Mean   : 3.539  
##  3rd Qu.: 6.540   3rd Qu.: 4.040  
##  Max.   :58.900   Max.   :31.800  
## 
ggplot(diamonds, aes(price, fill=cut, color=cut)) + geom_density(alpha=0.2) + theme_economist() +
scale_colour_economist() + labs(title = "Diamond Price Density",  x = "Diamond Price (USD)", y ="Density")

Vis 4. For this plot we are changing vis idioms to a scatter plot framework. Additionally, I am using ggplot2 package to fit a linear model to the data all within the plot framework. Three are edited labels and theme modifications as well.

data("iris")
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point(shape = 19) + geom_smooth(method = "lm", se = TRUE)+ labs(title = "Relationshop between Petal and Sepal Length", x = "Iris Petal Length", y ="Iris Sepal Length")+ theme(panel.background = element_rect(fill = "white"), panel.grid.major = element_line(colour = "grey"), panel.grid.minor = element_line(colour = "grey"))

Vis 5. Finally, in this vis I extend on the last example, by plotting the same data but using an additional channel to communicate species level differences. Again I fit a linear model to the data but this time one for each species, and add additional theme and labeling modicitations.

ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + 
geom_point() + 
geom_smooth(method = "lm", se = F)+
theme(panel.background = element_rect(fill = "white"), legend.key = element_rect(fill = "white"))+
labs(title = "Relationshop between Petal and Sepal Length", subtitle ="Species Level Comparison", x = "Iris Petal Length", y ="Iris Sepal Length")