Directions

During ANLY 512 we will be studying the theory and practice of data visualization. We will be using R and the packages within R to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.

The objective of this assignment is to introduce you to R markdown and to complete and explain basic plots before moving on to more complicated ways to graph data.

The final product of your homework (this file) should include a short summary of each graphic.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Moodle. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.

Questions

Find the mtcars data in R. This is the dataset that you will use to create your graphics.

  1. Create a pie chart showing the proportion of cars from the mtcars data set that have different cylinder (cyl) values.
cyl.freq <- table(mtcars$cyl) #save cylinder frequencies
labls <- names(cyl.freq) #add unique cylinder values to labels
percents <- round(cyl.freq/sum(cyl.freq)*100) #Create % (proportions) of each cylinder values
labls <- paste(labls, percents) # add percent values to labels 
labls <- paste(labls,"%",sep="") # ad % symbol to labels, remove "" 
pie(cyl.freq,labels = labls,col=c("red","blue","green"), main="Cars Distribution on Different Cylinder Values") #create pie-chart

  1. Create a bar graph, that shows the number of each carb type in mtcars.
carb.freq<-table(mtcars$carb) #Save carburetor frequencies
labels.carb<-names(carb.freq) #Create labels for uniques carburetor values
barplot(carb.freq,main ="Car Distribution",xlab="Carburetor Values",ylab="Number of Cars",names.arg=labels.carb,col=rainbow(length(labels.carb))) #Create barplot

  1. Next show a stacked bar graph of the number of each gear type and how they are further divided out by cyl.
mytable<-table(mtcars$gear,mtcars$cyl) #Create a table for gear and cylinder frequencies
label.gears<-paste(names(table(mtcars$gear))," Gears") #Create a vector with gears' labels
barplot(mytable,main="Car Distribution by Gears & Cylinders", 
        xlab="Number of Gears",
        ylab="Number of Cars",
        names.arg = label.gears,
        col = c("grey","orange","pink"),
        cex.names = 0.8,
        cex.axis = 0.8,
        space = 0.05,
        legend.text = rownames(mytable),
        args.legend = list(x ='topright', bty='n', inset=c(-0.085,0)))

  1. Draw a scatter plot showing the relationship between wt and mpg.
plot(mtcars$wt,mtcars$mpg,
     main="Relationship between \n Car Weight and Miles per Galon",
     cex.main=0.9, cex.lab=0.9,
     xlab="Car Weight", ylab="Miles per Gallon",
     pch=20,col="grey30") #Scatterplot
abline(lm(mtcars$mpg~mtcars$wt),col="darkgreen") #Regression line for y~x
lines(lowess(mtcars$wt,mtcars$mpg), col="red") # lowess line

  1. Design a visualization of your choice using the data and write a brief summary about why you chose that visualization.
#par(cex=0.9) # is for y-axis
boxplot(mpg~cyl,data=mtcars,
        main="How does Number of Cylinders Impact Miles Per Gallon? \n (No. of Cylinders vs MPG)",
        xlab="Number of Cylinders", ylab="Miles Per Gallon (MPG)",
        names=c("4 cylinders", "6 cylinders", "8 cylinders"))

Observation:

I wanted to see how the number of cylinders in a car impact the car’s fuel efficiency i.e. miles per gallon.

In doing so, I have chosen this visualization because the box plot gives us a comprehensive visual 5-number statistical summary about mpg of each cylinder group i.e. the interquartile range, the median, and the upper and lower quartiles. I can also see that the lower line of each rectangle indicates the minimum mpg and the upper line inidcates the maximum mpg.

Looking at the number of cylinders, I can observe that for the 4-cylinder group, the median (average) mpg value is around 26, the minimum mpg is about 23, and the maximum mpg is over 30. In contrast to this, the cars in 6 and 8 cylinder group are observed to have much lower mpg values i.e. the average mpg of cars in 6 cylinder group is around 20 and in 8 cylinder group it is just 15.

Hence, from this visualization, we can conclude that the greater the number of cylinders, the more fuel-efficient the car is.