The objectives of this problem set is to gain experience working with the ggplot2 package for data visualization.
This graphic is a traditional stacked bar chart. This graphic works on the mpg dataset, which is built into the ggplot2 library, the legends are renamed for more clarity.
#Explore the Dataset
install.packages("datasets", repos = "http://cran.us.r-project.org")
## Warning: package 'datasets' is not available (for R version 3.4.2)
## Warning: package 'datasets' is a base package, and should not be updated
library(datasets)
library(tidyverse)
## -- Attaching packages ------------------------------------------ tidyverse 1.2.1 --
## <U+221A> ggplot2 2.2.1 <U+221A> purrr 0.2.4
## <U+221A> tibble 1.4.2 <U+221A> dplyr 0.7.4
## <U+221A> tidyr 0.8.0 <U+221A> stringr 1.2.0
## <U+221A> readr 1.1.1 <U+221A> forcats 0.2.0
## Warning: package 'tibble' was built under R version 3.4.3
## Warning: package 'tidyr' was built under R version 3.4.3
## -- Conflicts --------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of 11 variables:
## $ manufacturer: chr "audi" "audi" "audi" "audi" ...
## $ model : chr "a4" "a4" "a4" "a4" ...
## $ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr "f" "f" "f" "f" ...
## $ cty : int 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr "p" "p" "p" "p" ...
## $ class : chr "compact" "compact" "compact" "compact" ...
mpg
## # A tibble: 234 x 11
## manufacturer model displ year cyl trans drv cty hwy fl
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr>
## 1 audi a4 1.80 1999 4 auto(l~ f 18 29 p
## 2 audi a4 1.80 1999 4 manual~ f 21 29 p
## 3 audi a4 2.00 2008 4 manual~ f 20 31 p
## 4 audi a4 2.00 2008 4 auto(a~ f 21 30 p
## 5 audi a4 2.80 1999 6 auto(l~ f 16 26 p
## 6 audi a4 2.80 1999 6 manual~ f 18 26 p
## 7 audi a4 3.10 2008 6 auto(a~ f 18 27 p
## 8 audi a4 quat~ 1.80 1999 4 manual~ 4 18 26 p
## 9 audi a4 quat~ 1.80 1999 4 auto(l~ 4 16 25 p
## 10 audi a4 quat~ 2.00 2008 4 manual~ 4 20 28 p
## # ... with 224 more rows, and 1 more variable: class <chr>
# Download the relevant library/package
library(ggplot2)
# Create the stacked bar visualtion with legend name = Transmission
Vis1<-ggplot(mpg, aes(class,fill=trans)) + geom_bar()+guides(fill=guide_legend(title="Transmission"))
Vis1
This boxplot is also built using the mpg dataset. Notice the changes in axis labels, and an altered theme_XXXX
Vis2<-ggplot(mpg,aes(x=manufacturer, y=hwy))+coord_flip()+geom_boxplot()+ guides(fill=FALSE)+theme_classic()+scale_x_discrete(name="Vehicle Manufacturer") + scale_y_continuous(name="Highway Fuel Efficiency (miles/gallon)")
Vis2
This graphic is built with another dataset diamonds a dataset also built into the ggplot2 package. For this one, additional package called library(ggthemes) is used.
library(ggplot2)
install.packages("ggthemes", repos = "http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/_f/hk0n1w157s5bfmt2ykvxt1yh0000gn/T//RtmpiiWfkD/downloaded_packages
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.4.4
diamonds
## # A tibble: 53,940 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.230 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
## 2 0.210 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
## 3 0.230 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
## 4 0.290 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63
## 5 0.310 Good J SI2 63.3 58.0 335 4.34 4.35 2.75
## 6 0.240 Very Good J VVS2 62.8 57.0 336 3.94 3.96 2.48
## 7 0.240 Very Good I VVS1 62.3 57.0 336 3.95 3.98 2.47
## 8 0.260 Very Good H SI1 61.9 55.0 337 4.07 4.11 2.53
## 9 0.220 Fair E VS2 65.1 61.0 337 3.87 3.78 2.49
## 10 0.230 Very Good H VS1 59.4 61.0 338 4.00 4.05 2.39
## # ... with 53,930 more rows
Vis3<-ggplot(diamonds)+ geom_density (aes(price, color= cut, fill= cut), alpha= 0.3, size =0.5) +theme_economist() +labs(x="Diamond Price(USD)", y="Density", title="Diamond Price Density")
Vis3
For this plot we are changing vis idioms to a scatter plot framework. Additionally, ggplot2 package is used to fit a linear model to the data within the plot framework. There are edited labels and theme modifications as well.
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Vis4<-ggplot(iris,aes(x=Sepal.Length,y=Petal.Length))+geom_point()+geom_smooth(method = lm)+labs(title="Relationship between Petal and Sepal Length", x="Iris Sepal Length", y="Iris Petal Length")+theme_minimal()
Vis4
Finally, in this vis last example is extended, by plotting the same data but using an additional channel to communicate species level differences. A linear model is fit to the data but this time one for each species, and add additional theme and labeling modicitations.
Vis5<-ggplot(iris,aes(x=Sepal.Length,y=Petal.Length,color=Species))+geom_point()+geom_smooth(method = lm,se=FALSE)+labs(title="Relationship between Petal and Sepal Length",subtitle = "Species level comparison",x="Iris Sepal Length", y="Iris Petal Length")+theme_tufte()+theme(legend.position = "bottom")
Vis5