During ANLY 512 we will be studying the theory and practice of data visualization. We will be using R
and the packages within R
to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.
The objective of this assignment is to introduce you to R markdown and to complete and explain basic plots before moving on to more complicated ways to graph data.
A couple of tips, remember that there may be preprocessing involved in your graphics so you may have to do summaries or calculations to prepare, those should be included in your work.
To ensure accuracy pay close attention to axes and labels, you will be evaluated based on the accuracy of your graphics.
The final product of your homework (this file) should include a short summary of each graphic.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
Find the mtcars
data in R. This is the dataset that you will use to create your graphics.
mtcars
data set that have different cylinder (cyl)
values.A: This is a pie chart: red area represents the porportion of cars with cyl=4; green area represents the porportion of cars with cyl=6; blue area represents the porportion of cars with cyl=8. From the graph we could see 8-cylinder cars have highest porportion, and 6-cylinder cars have lowest porportion
library(ggplot2)
mtcars$cyl <- factor(mtcars$cyl) # Create a categorical variable
pie_chart <- ggplot(mtcars, aes(x = "", fill = cyl)) +
geom_bar(position = "fill") +
coord_polar(theta = "y")+
ylab("Porportion of Different Cylinder Values")
pie_chart
carb
type in mtcars
.A: This is a bar graph: x-axis is different carb type that our data includes, and the height of each bar represent the number of observations we have for each carb type. From the graph we could see that carb type 2 and 4 has the highest number of cars in the sample.
mtcars$carb <- factor(mtcars$carb)
bar_plot <- ggplot(data = mtcars, aes(x = carb)) +
geom_bar() +
xlab("Carburetor Type") +
ylab("Counts")
bar_plot
gear
type and how they are further divided out by cyl
.A: This is a stacked bar graph with a-axis represents the Gear type; y-axis represents the counts of each gear type; the stacked color represents within each gear type, how the cylinder type is distributed. From the graph we could see that as the number of gears increase, number of cars in the sample of the gear type decreases. And the distribution of Cylinder Values changes across different gear type as well.
mtcars$gear <- factor(mtcars$gear) # Converts the gear variable into a factor
mtcars$cyl <- factor(mtcars$cyl) # Converts the cyl variable into a factor
stacked_bar <- ggplot(data = mtcars, aes(x = gear, fill = cyl)) +
geom_bar() +
xlab("Gear Type") +
ylab("Counts") +
scale_fill_discrete(name="Cylinders")
stacked_bar
wt
and mpg
.A: This is a scatter plot with x-axis represents the relationship between car weights and car Miles/(US) gallon. We can see from the graph that there’s a negative correlation between car weights and car MPG.
scatter = ggplot(mtcars, aes(x = wt, y = mpg))
scatter +
geom_point() +
xlab("Weights") +
ylab("MPG")
A: This is a box plot showing the range of MPG for each type of Cylinders. The reason why I choose this visualization is that this type of visualization can display the distribution of a numeric variable/feature and a categorical variable/feature. In this graph, it shows as the number of Cylinders increases, the average MPG and the variance of MPG decreases.
box_plot = ggplot(data = mtcars, aes(x = cyl, y = mpg))
box_plot +
geom_boxplot() +
xlab("Cylinder Type") +
ylab("MPG")