The mtcars dataset contains data on 32 brands of cars. You can find out more about the car characteristics included in the dataset here.
Let’s start, as usual, by loading packages we’re likely to need and today’s data:
.libPaths(c("/home/rstudioshared", "/home/rstudioshared/shared_files/packages"))
library(dplyr); library(ggplot2); library(corrplot)
data(mtcars)
Next, also perhaps as usual, let’s start by looking at the data:
View(mtcars)
Now, let’s make a histogram of miles per gallon (mpg) with areas filled based on the number of cylinders of the engine:
ggplot(mtcars, aes(mpg, fill=as.factor(cyl)))+geom_histogram()
We could also use dplyr to find the mean and standard deviation of miles per gallon by number of cylinders.
mtcars %>% group_by(cyl) %>% summarize(mean(mpg), sd(mpg))
What kind of relationship do you see between miles per gallon and number of cylinders? One way to quantify this relationship is to find the correlation between mpg and cyl.
Here are two ways of doing this in R:
cor(mtcars$mpg, mtcars$cyl)
with(mtcars, cor(mpg, cyl))
Next, let’s look at the relationship between mpg and drat (Rear axle ratio):
ggplot(mtcars, aes(mpg, drat))+geom_point()
Before calculating the correlation between mpg and drat, take a moment to what kind of correlation would you expect. Will it be positive or negative? Will it be near zero or closer to 1 or -1? Once you’ve made your guess, calculate the correlation between these variables.
Now, let’s look at that same graph but with points colored by the number of cylinders:
ggplot(mtcars, aes(mpg, drat, color=as.factor(cyl)))+geom_point()
We can also find correlations within groups. What kinds of correlations do you expect to find between mpg and drat for cars with 4 cylinders? 6 cylinders? 8 cylinders?
We can use dplyr to find these correlations:
mtcars %>% group_by(cyl) %>% summarize(cor(drat, mpg))
How do the correlations within groups compare to the overall correlation between drat and mpg?
Finally, we can look at the correlations all at once either as a matrix or in a plot. Here are three ways to do it.
View(cor(mtcars))
corrplot(cor(mtcars))
corrplot(cor(mtcars), method="number")
What is the strongest positive correlation in this data set? What is the strongest inverse correlation?