This is a simple graphical represenation of relationship between variables of mtcars dataset as part of the Data Visualization Class Homework. Below is a short summary of mtcars data set and type of it’s variables.
## Summary of the data set
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Objective: Create a pie chart showing the proportion of cars from the mtcars data set that have different carb values.
Below is the R-code for data cleaning, prepation and visualization.
counts <- table(mtcars$carb)
total <- sum(counts)
proportions <- round((counts*100)/total,digits = 1)
lbls <- paste(names(proportions), ", ", proportions, "%", sep = "")
pie(proportions, labels = lbls, main="Pie Chart of Proportion of Cars with Different Number of Carburetors")
The pie chart describes the percentage breakdown of cars in the mtcars data set based on different number of carburetors. Each number besides a pie portion displays the number of carburetors and the percentage associated with it is the proportion of actual number of cars with that many carburetors in the entire mtcars data set.There is a 0.01% rounding error in the chart to keep the display to one decimal point.
Objective: Create a bar graph, that shows the number of each gear type in mtcars.
Below is the R-code for data cleaning, prepation and visualization.
gear_count <- table(mtcars$gear)
barplot(gear_count, main="Car Distribution by Number of Gears", xlab="Number of Gears", col = "lightblue")
The above bar chart breaks down the number of gears into it’s different levels (3, 4 and 5) on the x-axis and shows the total number of cars that belong to each level of gears, which is measured through the y-axis.
Objective: Next show a stacked bar graph of the number of each gear type and how they are further divided out by cyl.
Below is the R-code for data cleaning, prepation and visualization.
stacked_count <- table(mtcars$cyl, mtcars$gear)
barplot(stacked_count, main="Car Distribution by Number of Gears and Cylinders",
xlab="Number of Gears", col=c("darkblue","red","darkgreen"),
legend = rownames(stacked_count), args.legend = list(title = "Number of Cylinders"))
Movind a step further, this stacked barplot extends the bar chart described in the previous part. This chart displays the number of cars on the y-axis based on categorization of number of gears on the x-axis, similar to Part II. However, this chart also adds another layer of visualization by breaking down number of cars in each gear group into cars that have different number of cylinders (4, 6 and 8). This is highlighted by showcasing different colors to identify number of cars with different cylinders in each gear level. For example, there is 1 car shaded in dark blue, indicating number of cylinders = 4, for the gear level 3. This means that our of 15 cars that have 3 gears, only 1 has 4 cylinders, 2 have 6 cylinders and the reamining have 8 cylinders.
Objective: Draw a scatter plot showing the relationship between wt and mpg.
Below is the R-code for data cleaning, prepation and visualization.
plot(mtcars$wt, mtcars$mpg, main="Distribution of Car Mileage vs Car Weight", xlab="Car Weight (1000 lbs)", ylab="Miles Per Gallon ", col = "blue")
This scatter plot provides a visualization for change in the miles per gallon, depicted on the y-axis, as the car weight is changed on the x-axis. Scatter plots essentially provide a directional patter on change in the response variable (on y-axis) as the independent variable (on x-axis) is changed. Here, the graph illusrates that, on an average, the mileage (mpg) of a car decreases as the weight of car is increased.
Objective: Design a visualization of your choice using the data and write a brief summary about why you chose that visualization.
Below is the R-code for data cleaning, prepation and visualization.
boxplot(mtcars$mpg ~ mtcars$cyl, main = "Box Plot of Mileage vs Number of Cylinders", xlab = "Number of Cylinders", ylab = "Miles per Gallon",
col = "lightgreen")
I have found that one of the most useful plots for visualization is the box plot, especially if we are looking at representating factor data types with multiple levels. Moreover, a box plot provides a measure of mean of response variable for each level of independent variable on the x-axis. It also represents the interquantile range of the data for each level of independent variable that provides a visual interpretation of the variance of data point in those levels. Addtionally, the box plot also shows the min and max ranges of the values for each level, effectively highlighting the outliers at a glance. We see this for cyclinder group 8 where one outlier has very low mpg value, which is displayed below the box.
Here, I have utilized a box plot to display the miles per gallon values for cars with different cylinder types. Based on the visualization, we can interpret that the average miles per gallon for a car is higher for lower number of cylinders, i.e. mpg decreases as the number of cylinders increases. We can also infer that there is high variability in the mpg values of cars with 4 cylinders as compared to cars with 6 or 8 cylinders. This can be inferred by the bigger interquantile range for number of cyliders group 4 vs that of group 6 and 8.