ggplot basics

Directions

During ANLY 512 we will be studying the theory and practice of data visualization. We will be using R and the packages within R to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.

The objective of this assignment is to introduce you to R markdown and to complete and explain basic plots before moving on to more complicated ways to graph data.

A couple of tips, remember that there may be preprocessing involved in your graphics so you may have to do summaries or calculations to prepare, those should be included in your work.
To ensure accuracy pay close attention to axes and labels, you will be evaluated based on the accuracy of your graphics.

The final product of your homework (this file) should include a short summary of each graphic.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.

Questions

Find the mtcars data in R. This is the dataset that you will use to create your graphics.

Create a pie chart using ggplot showing the proportion of cars from the mtcars data set that have different cylinder (cyl) values.

A: This is a pie chart: red area represents the porportion of cars with cyl=4; green area represents the porportion of cars with cyl=6; blue area represents the porportion of cars with cyl=8. From the graph we could see 8-cylinder cars have highest porportion, and 6-cylinder cars have lowest porportion

library(ggplot2)
mtcars$cyl <- factor(mtcars$cyl) # Create a categorical variable

pie_chart <- ggplot(mtcars, aes(x = "", fill = cyl)) + 
  geom_bar(position = "fill") + 
  coord_polar(theta = "y")+
  ylab("Porportion of Different Cylinder Values")
pie_chart

Create a bar graph using ggplot, that shows the number of each carb type in mtcars.

A: This is a bar graph: x-axis is different carb type that our data includes, and the height of each bar represent the number of observations we have for each carb type. From the graph we could see that carb type 2 and 4 has the highest number of cars in the sample.

mtcars$carb <- factor(mtcars$carb)
bar_plot <- ggplot(data = mtcars, aes(x = carb)) + 
  geom_bar() + 
  xlab("Carburetor Type") +
  ylab("Counts")
bar_plot

Next show a stacked bar graphusing ggplot of the number of each gear type and how they are further divided out by cyl.

A: This is a stacked bar graph with a-axis represents the Gear type; y-axis represents the counts of each gear type; the stacked color represents within each gear type, how the cylinder type is distributed. From the graph we could see that as the number of gears increase, number of cars in the sample of the gear type decreases. And the distribution of Cylinder Values changes across different gear type as well.

mtcars$gear <- factor(mtcars$gear)  # Converts the gear variable into a factor
mtcars$cyl <- factor(mtcars$cyl)  # Converts the cyl variable into a factor

stacked_bar <- ggplot(data = mtcars, aes(x = gear, fill = cyl)) +
  geom_bar() +
  xlab("Gear Type") + 
  ylab("Counts") + 
  scale_fill_discrete(name="Cylinders")
stacked_bar

Draw a scatter plot using ggplot showing the relationship between wt and mpg.

A: This is a scatter plot with x-axis represents the relationship between car weights and car Miles/(US) gallon. We can see from the graph that there’s a negative correlation between car weights and car MPG.

scatter = ggplot(mtcars, aes(x = wt, y = mpg))
scatter +
  geom_point() +
  xlab("Weights") +
  ylab("MPG")

Design a visualization of your choice using ggplot using the data and write a brief summary about why you chose that visualization.

A: This is a box plot showing the range of MPG for each type of Cylinders. The reason why I choose this visualization is that this type of visualization can display the distribution of a numeric variable/feature and a categorical variable/feature. In this graph, it shows as the number of Cylinders increases, the average MPG and the variance of MPG decreases.

box_plot = ggplot(data = mtcars, aes(x = cyl, y = mpg)) 
box_plot + 
  geom_boxplot() + 
  xlab("Cylinder Type") +
  ylab("MPG")

ggplot basics

Basic Visualization Process

Wanqi Huang

2019-11-26

Directions

Questions