In this exercise you will learn to visualize the pairwise relationships between a set of quantitative variables. To this end, you will make your own note of 8.1 Correlation plots from Data Visualization with R.
# import data
data(mpg, package="ggplot2")
# select numeric variables
df <- dplyr::select_if(mpg, is.numeric)
# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2)
## displ year cyl cty hwy
## displ 1.00 0.15 0.93 -0.80 -0.77
## year 0.15 1.00 0.12 -0.04 0.00
## cyl 0.93 0.12 1.00 -0.81 -0.76
## cty -0.80 -0.04 -0.81 1.00 0.96
## hwy -0.77 0.00 -0.76 0.96 1.00
library(ggplot2)
library(ggcorrplot)
# visualize the correlations
ggcorrplot(r,
hc.order = TRUE,
type = "lower",
lab = TRUE)
Positive correlation factors in this case is just cty ## Q2 What factors have strong positve correlation with home price? strong positive correlation factors are hwy, cty, displ, cyl ## Q3 What factors have negative correlation with home price? negative correlation factors are displ,cyl. ## Q4 What factors have strong negative correlation with home price? strong negative factors are hwy, displ and hwy cyl and cty displ and cty cyl ## Q5 What set of two variables has the highest positive Pearson Product-Moment correlation coefficient? What set of two variables has the greatest negative Pearson Product-Moment correlation coefficient? hwy, cty and hwy, cyl Age, bathrooms ## Q6 What set of two variables has the Pearson Product-Moment correlation coefficent that is closest to zero? Would you be sure that the two variables are not related at all? What would you do to check? hwy and year are at 0. This only means that they are not related in a linear fashion. Scatter plot would be a good way to check.
ggplot(mpg,
aes(x = hwy,
y = year)) +
geom_point()
Hint: The CPS85 data set is from the mosaicData package. Explain wage instead of home price.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.