Objectives

The objectives of this problem set is to gain experience working with the ggplot2 package for data visualization.

Vis 1

This graphic is a traditional stacked bar chart. This graphic works on the mpg dataset, which is built into the ggplot2 library, the legends are renamed for more clarity.

#Explore the Dataset
install.packages("datasets", repos = "http://cran.us.r-project.org")
## Warning: package 'datasets' is not available (for R version 3.4.2)
## Warning: package 'datasets' is a base package, and should not be updated
library(datasets)
library(tidyverse)
## -- Attaching packages ------------------------------------------ tidyverse 1.2.1 --
## <U+221A> ggplot2 2.2.1     <U+221A> purrr   0.2.4
## <U+221A> tibble  1.4.2     <U+221A> dplyr   0.7.4
## <U+221A> tidyr   0.8.0     <U+221A> stringr 1.2.0
## <U+221A> readr   1.1.1     <U+221A> forcats 0.2.0
## Warning: package 'tibble' was built under R version 3.4.3
## Warning: package 'tidyr' was built under R version 3.4.3
## -- Conflicts --------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame':    234 obs. of  11 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ model       : chr  "a4" "a4" "a4" "a4" ...
##  $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr  "p" "p" "p" "p" ...
##  $ class       : chr  "compact" "compact" "compact" "compact" ...
mpg
## # A tibble: 234 x 11
##    manufacturer model    displ  year   cyl trans   drv     cty   hwy fl   
##    <chr>        <chr>    <dbl> <int> <int> <chr>   <chr> <int> <int> <chr>
##  1 audi         a4        1.80  1999     4 auto(l~ f        18    29 p    
##  2 audi         a4        1.80  1999     4 manual~ f        21    29 p    
##  3 audi         a4        2.00  2008     4 manual~ f        20    31 p    
##  4 audi         a4        2.00  2008     4 auto(a~ f        21    30 p    
##  5 audi         a4        2.80  1999     6 auto(l~ f        16    26 p    
##  6 audi         a4        2.80  1999     6 manual~ f        18    26 p    
##  7 audi         a4        3.10  2008     6 auto(a~ f        18    27 p    
##  8 audi         a4 quat~  1.80  1999     4 manual~ 4        18    26 p    
##  9 audi         a4 quat~  1.80  1999     4 auto(l~ 4        16    25 p    
## 10 audi         a4 quat~  2.00  2008     4 manual~ 4        20    28 p    
## # ... with 224 more rows, and 1 more variable: class <chr>
# Download the relevant library/package
library(ggplot2)

# Create the stacked bar visualtion with legend name = Transmission
Vis1<-ggplot(mpg, aes(class,fill=trans)) + geom_bar()+guides(fill=guide_legend(title="Transmission"))
Vis1

Vis 2

This boxplot is also built using the mpg dataset. Notice the changes in axis labels, and an altered theme_XXXX

Vis2<-ggplot(mpg,aes(x=manufacturer, y=hwy))+coord_flip()+geom_boxplot()+ guides(fill=FALSE)+theme_classic()+scale_x_discrete(name="Vehicle Manufacturer") + scale_y_continuous(name="Highway Fuel Efficiency (miles/gallon)")
Vis2

Vis 3

This graphic is built with another dataset diamonds a dataset also built into the ggplot2 package. For this one, additional package called library(ggthemes) is used.

library(ggplot2)
install.packages("ggthemes", repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/_f/hk0n1w157s5bfmt2ykvxt1yh0000gn/T//RtmpiiWfkD/downloaded_packages
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.4.4
diamonds
## # A tibble: 53,940 x 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1 0.230 Ideal     E     SI2      61.5  55.0   326  3.95  3.98  2.43
##  2 0.210 Premium   E     SI1      59.8  61.0   326  3.89  3.84  2.31
##  3 0.230 Good      E     VS1      56.9  65.0   327  4.05  4.07  2.31
##  4 0.290 Premium   I     VS2      62.4  58.0   334  4.20  4.23  2.63
##  5 0.310 Good      J     SI2      63.3  58.0   335  4.34  4.35  2.75
##  6 0.240 Very Good J     VVS2     62.8  57.0   336  3.94  3.96  2.48
##  7 0.240 Very Good I     VVS1     62.3  57.0   336  3.95  3.98  2.47
##  8 0.260 Very Good H     SI1      61.9  55.0   337  4.07  4.11  2.53
##  9 0.220 Fair      E     VS2      65.1  61.0   337  3.87  3.78  2.49
## 10 0.230 Very Good H     VS1      59.4  61.0   338  4.00  4.05  2.39
## # ... with 53,930 more rows
Vis3<-ggplot(diamonds)+ geom_density (aes(price, color= cut, fill= cut), alpha= 0.3, size =0.5) +theme_economist() +labs(x="Diamond Price(USD)", y="Density", title="Diamond Price Density")
Vis3

Vis 4

For this plot we are changing vis idioms to a scatter plot framework. Additionally, ggplot2 package is used to fit a linear model to the data within the plot framework. There are edited labels and theme modifications as well.

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Vis4<-ggplot(iris,aes(x=Sepal.Length,y=Petal.Length))+geom_point()+geom_smooth(method = lm)+labs(title="Relationship between Petal and Sepal Length", x="Iris Sepal Length", y="Iris Petal Length")+theme_minimal()
Vis4

Vis 5

Finally, in this vis last example is extended, by plotting the same data but using an additional channel to communicate species level differences. A linear model is fit to the data but this time one for each species, and add additional theme and labeling modicitations.

Vis5<-ggplot(iris,aes(x=Sepal.Length,y=Petal.Length,color=Species))+geom_point()+geom_smooth(method = lm,se=FALSE)+labs(title="Relationship between Petal and Sepal Length",subtitle = "Species level comparison",x="Iris Sepal Length", y="Iris Petal Length")+theme_tufte()+theme(legend.position = "bottom")
Vis5