In this exercise you will learn to visualize the pairwise relationships between a set of quantitative variables. To this end, you will make your own note of 8.1 Correlation plots from Data Visualization with R.
# import data
data(SaratogaHouses, package="mosaicData")
# select numeric variables
df <- dplyr::select_if(SaratogaHouses, is.numeric)
# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2)
library(ggplot2)
library(ggcorrplot)
# visualize the correlations
ggcorrplot(r,
hc.order = TRUE,
type = "lower",
lab = TRUE)
The more fireplaces a house gets the omre expensive it is.
Living area has a strong correlation with home price. The bigger the rooms the more expensive the house is.
Age has has a negative correlation with home price.
There are no factors that have a strong negative correlation with home price.
Living area and rooms has the highest positive correlation corefficient. Bathrooms and age has the higihest negative correlation coefficient.
Hint: The CPS85 data set is from the mosaicData package. Explain wage instead of home price.
# import data
data(CPS85, package="mosaicData")
# select numeric variables
df <- dplyr::select_if(CPS85, is.numeric)
# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2)
library(ggplot2)
library(ggcorrplot)
# visualize the correlations
ggcorrplot(r,
hc.order = TRUE,
type = "lower",
lab = TRUE)
Education, age, exper all has a positive correlation with wage.
No factors have a strong positive correlation with the wage.
No factors have a negative correlation with the wage.
No facotrs have a strong negative correlation with the wage.
Exper and age have the highest positve correlation coefficient.
Exper and educaton have the greatest negative correlation coefficient.
ggplot(CPS85,
aes(x = age,
y = wage)) +
geom_point()
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.