Directions

During ANLY 512 we will be studying the theory and practice of data visualization. We will be using R and the packages within R to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.

The objective of this assignment is to introduce you to R markdown and to complete and explain basic plots before moving on to more complicated ways to graph data.

The final product of your homework (this file) should include a short summary of each graphic.

Each question is worth 5 points.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.

Questions

library(knitr)
library(formatR)
opts_chunk$set(tidy.opts = list(width.cutoff = 100), tidy=TRUE)
library(ggplot2)
library(dplyr)

Find the mpg data in R. This is the dataset that you will use for the first three questions.

summary(mpg)
##  manufacturer          model               displ            year     
##  Length:234         Length:234         Min.   :1.600   Min.   :1999  
##  Class :character   Class :character   1st Qu.:2.400   1st Qu.:1999  
##  Mode  :character   Mode  :character   Median :3.300   Median :2004  
##                                        Mean   :3.472   Mean   :2004  
##                                        3rd Qu.:4.600   3rd Qu.:2008  
##                                        Max.   :7.000   Max.   :2008  
##       cyl           trans               drv                 cty       
##  Min.   :4.000   Length:234         Length:234         Min.   : 9.00  
##  1st Qu.:4.000   Class :character   Class :character   1st Qu.:14.00  
##  Median :6.000   Mode  :character   Mode  :character   Median :17.00  
##  Mean   :5.889                                         Mean   :16.86  
##  3rd Qu.:8.000                                         3rd Qu.:19.00  
##  Max.   :8.000                                         Max.   :35.00  
##       hwy             fl               class          
##  Min.   :12.00   Length:234         Length:234        
##  1st Qu.:18.00   Class :character   Class :character  
##  Median :24.00   Mode  :character   Mode  :character  
##  Mean   :23.44                                        
##  3rd Qu.:27.00                                        
##  Max.   :44.00
mpg
## # A tibble: 234 x 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto~ f        18    29 p     comp~
##  2 audi         a4           1.8  1999     4 manu~ f        21    29 p     comp~
##  3 audi         a4           2    2008     4 manu~ f        20    31 p     comp~
##  4 audi         a4           2    2008     4 auto~ f        21    30 p     comp~
##  5 audi         a4           2.8  1999     6 auto~ f        16    26 p     comp~
##  6 audi         a4           2.8  1999     6 manu~ f        18    26 p     comp~
##  7 audi         a4           3.1  2008     6 auto~ f        18    27 p     comp~
##  8 audi         a4 quattro   1.8  1999     4 manu~ 4        18    26 p     comp~
##  9 audi         a4 quattro   1.8  1999     4 auto~ 4        16    25 p     comp~
## 10 audi         a4 quattro   2    2008     4 manu~ 4        20    28 p     comp~
## # ... with 224 more rows
  1. Create a box plot using ggplot showing engine displacement displ for each transmission type trans from the mpg data set. Hint: Can you figure out how to rotate the x-axis categories so they are all readable?
# place the code to import graphics here
ggplot(mpg, aes(trans, displ)) + geom_boxplot(aes(group = cut_width(displ, 0.5)), color = "blue")

# This graph demonstrates a linear relationship between Transmission type and Engine displacement.
  1. Create a histogram or bar graph using ggplot, that shows the frequency of each class type in mpg.
# place the code to import graphics here
ggplot(mpg, aes(x = class)) + stat_count(width = 0.35)

# A simple graph that shows how many items for each vehicle class.
  1. Next show a stacked bar graph using ggplot, that shows the frequency of each cyl type within class. Hint:You might have to use (group) or convert cyl to a factor (as.factor).
# place the code to import graphics here
ggplot(mpg, aes(x = factor(class), fill = factor(cyl)), binwidth = 2) + xlab("Class") + ylab("Cyl Type Count") +
    geom_bar() + labs(fill = "Cyl Type")

# It shows the component & variety - in terms of Cyl - for each vehicle class types. We can see for
# SUVs, there appear to be more 8 cylinder cars, which makes sense because buyers would want more
# power for utility vehicles.
  1. Draw a scatter plot using ggplot showing the relationship between cty and hwy. Explain the utility or lack of utility of this graphic.
# place the code to import graphics here
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + geom_jitter()

# This graph is somewhat useful - at least it indicates a linear relationship between the two
# variables.
# Slight improvement:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = class)) + geom_count(position = "jitter")

  1. Design a visualization of your choice using ggplot using mpg and write a brief summary about why you chose that visualization.
# place the code to import graphics here
table(mpg$manufacturer)  #Gives the list of manufacturer
## 
##       audi  chevrolet      dodge       ford      honda    hyundai       jeep 
##         18         19         37         25          9         14          8 
## land rover    lincoln    mercury     nissan    pontiac     subaru     toyota 
##          4          3          4         13          5         14         34 
## volkswagen 
##         27
ggplot(data = mpg) + geom_bar(mapping = aes(x = manufacturer, fill = manufacturer))

The above graph gives the audience a general idea of car manufacturers that were included in the data set and their frequency. We can see from the graph that Dodge, Toyota and Volkswagen appear to have the most vehicle counts in this exercise.