In this exercise you will learn to visualize the pairwise relationships between a set of quantitative variables. To this end, you will make your own note of 8.1 Correlation plots from Data Visualization with R.

# import data
data(SaratogaHouses, package="mosaicData")

# select numeric variables
df <- dplyr::select_if(SaratogaHouses, is.numeric)

# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2)
##            price lotSize   age landValue livingArea pctCollege bedrooms
## price       1.00    0.16 -0.19      0.58       0.71       0.20     0.40
## lotSize     0.16    1.00 -0.02      0.06       0.16      -0.03     0.11
## age        -0.19   -0.02  1.00     -0.02      -0.17      -0.04     0.03
## landValue   0.58    0.06 -0.02      1.00       0.42       0.23     0.20
## livingArea  0.71    0.16 -0.17      0.42       1.00       0.21     0.66
## pctCollege  0.20   -0.03 -0.04      0.23       0.21       1.00     0.16
## bedrooms    0.40    0.11  0.03      0.20       0.66       0.16     1.00
## fireplaces  0.38    0.09 -0.17      0.21       0.47       0.25     0.28
## bathrooms   0.60    0.08 -0.36      0.30       0.72       0.18     0.46
## rooms       0.53    0.14 -0.08      0.30       0.73       0.16     0.67
##            fireplaces bathrooms rooms
## price            0.38      0.60  0.53
## lotSize          0.09      0.08  0.14
## age             -0.17     -0.36 -0.08
## landValue        0.21      0.30  0.30
## livingArea       0.47      0.72  0.73
## pctCollege       0.25      0.18  0.16
## bedrooms         0.28      0.46  0.67
## fireplaces       1.00      0.44  0.32
## bathrooms        0.44      1.00  0.52
## rooms            0.32      0.52  1.00
library(ggplot2)
library(ggcorrplot)

# visualize the correlations
ggcorrplot(r, 
           hc.order = TRUE,
           lab = TRUE)

Q1 What factors have positve correlation with home price?

fireplaces, bedroms

Q2 What factors have strong positve correlation with home price?

Living area, bathrooms

Q3 What factors have negative correlation with home price?

Age

Q4 What factors have strong negative correlation with home price?

There is none in this case

Q5 What set of two variables has the highest positive Pearson Product-Moment correlation coefficient? What set of two variables has the greatest negative Pearson Product-Moment correlation coefficient?

Living area, rooms bathrooms, age ## Q6 What set of two variables has the Pearson Product-Moment correlation coefficent that is closest to zero? Would you be sure that the two variables are not related at all? What would you do to check? lotsize and age

# simple scatterplot
ggplot(SaratogaHouses, 
       aes(x = landValue, 
           y = age)) +
  geom_point()

there is no nonlinear or linear relationship

Q7 Plot correlation for CPS85 in the same way as above. Repeat Q1-Q6.

Hint: The CPS85 data set is from the mosaicData package. Explain wage instead of home price.

# import data
data(CPS85, package="mosaicData")

# select numeric variables
df <- dplyr::select_if(CPS85, is.numeric)

# calulate the correlations
r <- cor(df, use="complete.obs")
round(r,2)
##       wage  educ exper   age
## wage  1.00  0.38  0.09  0.18
## educ  0.38  1.00 -0.35 -0.15
## exper 0.09 -0.35  1.00  0.98
## age   0.18 -0.15  0.98  1.00
library(ggplot2)
library(ggcorrplot)

# visualize the correlations
ggcorrplot(r, 
           hc.order = TRUE,
           lab = TRUE)

positive: None Strong positive:None Negative:None Strong negative:None

Expirence and age ## Q8 Hide the messages, the code and its results on the webpage. Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.