Gayathri Mutyala
During ANLY 512 we will be studying the theory and practice of data visualization. We will be using R and the packages within R to assemble data and construct many different types of visualizations. We begin by studying some of the theoretical aspects of visualization. To do that we must appreciate the basic steps in the process of making a visualization.
The objective of this assignment is to introduce you to R markdown and to complete and explain basic plots before moving on to more complicated ways to graph data.
A couple of tips, remember that there is preprocessing involved in many graphics so you may have to do summaries or calculations to prepare, those should be included in your work.
To ensure accuracy pay close attention to axes and labels, you will be evaluated based on the accuracy of your graphics.
The final product of your homework (this file) should include a short summary of each graphic.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Moodle. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.
Find the mtcars data in R. This is the dataset that you will use to create your graphics.
mtcars data set that have different carb values.data("mtcars")
View(mtcars)
# Calculating the frequency of different carb values using table function
carb <- table(mtcars$carb)
# Create percent proportion label values
percent_labels <- round(100*prop.table(carb),1)
# Create labels for each pie proportion
pie_labels <- paste(percent_labels, "%", sep="")
#To create the Pie Chart with topo colors palette
pie(carb, labels = pie_labels , main = 'Percentage of Cars by Carb', cex = 0.8, col=topo.colors(6))
# Adding legend to the pie chart
legend("bottomleft", c("Carb-1","Carb-2","Carb-3","Carb-4","Carb-6","Carb-8"), cex=0.8, fill=topo.colors(6))
Form the above pie chart we can see that the cars with carb 2 and carb 4 are having equal proportions 31.2% and they are highest. Carb 8 and carb 6 are having equal and lowwest poportion of 3.1%
gear type in mtcars.#Calculating the frequency
gear<-table(mtcars$gear)
# To Create a Bar Plot
barplot(gear,main = "Number of Each Gear Type",xlab ="Number of Gears",ylab = "Number of Cars",col=terrain.colors(5))
The above Bar Graph shows that majority of the cars are having 3 gears and there are very few with 5 geras. More than 14 cars have 3 gear system, less than 6 cars have 5 gear system and 12 cars have 4 gear system.
gear type and how they are further divided out by cyl.library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
ggplot(data=mtcars,
aes(x = factor(gear), fill = factor(cyl))) + geom_bar() + scale_fill_discrete("Number of Cylinders") + ggtitle("Number of Cars Per Gear Type & Number of Cylinders") + xlab("Number of Gears") + ylab("Count")
The above Stacked bar graph shows the further breakdown of cars based on gear type and no of cylinders. Majority of the 3 gear cars have 8 cylinders. More than half of 4 gear cars have 4 cylinders. There are very few cars with 6 cylinders.
wt and mpg.plot(mtcars$wt,mtcars$mpg,main="Relationship between 'Wt' & 'MPG'",xlab="Weight",ylab="Miles per Gallon", col='darkred', cex =0.9, pch=16, panel.first = grid())
# regression line
abline(lm(mtcars$mpg~mtcars$wt), col="darkblue")
The above scatter plot between weight and MPG shows that they both are inversly proportional.if weight increases the MPG decreases. Regression line in blue color clearly shows that car weight and MPG share negative corellation.
library(psych)
## Warning: package 'psych' was built under R version 3.5.3
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
pairs.panels(mtcars[,1:6],main="ScatterPlot Matrix")
I used scatter plot matrix to visualize the the data and to see the correlations with in the given data. This makes further data analysis process easy. we can clearly see how each variable is effecting other variable using scatter plot matrix.