Problem Set 1

Directions

During ANLY 512 we will be studying the theory and practice of data visualization. We will be using R and the packages within R to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.

The objective of this assignment is to introduce you to R markdown and to complete and explain basic plots before moving on to more complicated ways to graph data.

A couple of tips, remember that there may be preprocessing involved in your graphics so you may have to do summaries or calculations to prepare, those should be included in your work.
To ensure accuracy pay close attention to axes and labels, you will be evaluated based on the accuracy of your graphics.

The final product of your homework (this file) should include a short summary of each graphic.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Moodle. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.

Questions

Find the mtcars data in R. This is the dataset that you will use to create your graphics.

Create a pie chart showing the proportion of cars from the mtcars data set that have different cylinder (cyl) values.

# place the code to import graphics here
# Get inspired by URL: https://www.statmethods.net/graphs/pie.html
data("mtcars")
library(plotrix) # The pie3D() function in the plotrix package provides 3D exploded pie charts

## Warning: package 'plotrix' was built under R version 3.5.3

slices <- table(mtcars$cyl)
lbls <- names(slices)
prop <- slices/length(mtcars$cyl) * 100
lbls <- paste(lbls, prop) # add proportion to labels
lbls <- paste(lbls, "%", sep = "") # add % to labels
pie3D(slices, labels=lbls, explode=0.1, main = "Pie chart of different cylinder (cyl) values")

From the pie chart, we can see that 4 cylinders account for 34.375%, 6 cylinders represent 21.875%, 8 cylinders occupy 43.75%

Create a bar graph, that shows the number of each carb type in mtcars.

# place the code to import graphics here
c <- table(mtcars$carb)
barplot(c, main = "The number of each carb type in mtcars", col = "lightblue", 
        xlab = "Number of carburetors")

From the bar graph, we can see that 10 of 32 cars have 2 carburetors while 10 of them have 4 carburetors, 6 carburetors and 8 carburetors have only one car respectively.

Next show a stacked bar graph of the number of each gear type and how they are further divided out by cyl.

# place the code to import graphics here
# Get inspired by URL: https://www.statmethods.net/graphs/bar.html
m <- table(mtcars$cyl, mtcars$gear)
barplot(m, main = "Car Distribution by Gears and cylinders", 
        xlab = "Number of forward gears", col = c("lightblue", "yellow", "orange"),
        legend = rownames(m))

From the stacked bar graph, we can see that 15 of 32 cars have 3 gears while 12 cars have 4 gears but none of these 12 cars has 8 cylinders.

Draw a scatter plot showing the relationship between wt and mpg.

# place the code to import graphics here
# Get inspired by URL: https://www.statmethods.net/graphs/scatterplot.html
library(car)

## Warning: package 'car' was built under R version 3.5.3

## Loading required package: carData

scatterplot(mpg ~ wt, data = mtcars, 
            xlab = "Weight (1000 lbs)", ylab = "Miles Per Gallon", 
            main = "Relatioship between wt and mpg")

From the scatter plot, we can see that the fitting line is a downward slope, which means more weigted vehicles have less mpg.It definitely make sense.

Design a visualization of your choice using the data and write a brief summary about why you chose that visualization.

As a car consumer, not all people are familiar with all terms about vehicle. But one thing consumer does know is that they want to purchase a relatively economical car which can run further with less fule/maintenance. So I want to do a comparsion of normalised mpg between these 32 model to help people make purchase decision economically according to their own preference and budget. In the below, the mpg from mtcars dataset is normalised by computing z score. Those vehicles with normalised mpg z score above zero are marked green and those below are marked red.

# place the code to import graphics here
# Get inspired by URL (a very good resource for visualiation):
# http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
mtcars$v_name <- rownames(mtcars)  # create new column for vehicle name
mtcars$mpg_z <- round((mtcars$mpg - mean(mtcars$mpg)) / sd(mtcars$mpg), 2) # compute normallised mpg z score
mtcars$mpg_type <- ifelse(mtcars$mpg_z < 0, "below", "above") # flag z score <0 to be below, >0 to be above
mtcars <- mtcars[order(mtcars$mpg_z), ] # sort, order by mpg_z
mtcars$v_name <- factor(mtcars$v_name, levels = mtcars$v_name) # convert vehicle name from char to factor

# Diverging Barchart
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.5.3

ggplot(mtcars, aes(x = v_name, y = mpg_z, label = mpg_z)) + 
  geom_bar(stat = 'identity', aes(fill = mpg_type), width = .5) + 
  ylab("Normalised mpg Z score") + xlab("Vehicle name") + # flip the x and y label here due to the coord_flip at the end
  scale_fill_manual(name = "Mileage", 
                    labels = c("Above Average", "Below Average"), 
                    values = c("above" = "green", "below" = "red")) + 
  labs(title = "The comparison of normalised mpg between 32 vehicle models") + 
  coord_flip()

Problem Set 1

Basic Visualization Process

Jindong Zhao

2019-09-02

Directions

Questions