Quiz on Correlation

Chapter 1: Visualizing two variables

library(openintro)
library(ggplot2)
library(dplyr)

# Load data
data(countyComplete) # It comes from the openintro package

# Create a new variable, rural
countyComplete$rural <- ifelse(countyComplete$density < 500, "rural", "urban")
countyComplete$rural <- factor(countyComplete$rural)

1.1 Scatterplots

# Load packages
library(ggplot2)

# Scatterplot of weight vs. weeks
ggplot(data = countyComplete, aes(x = per_capita_income, y = bachelors)) + geom_point()

1.2 Boxplots as discretized/conditioned scatterplots

# Boxplot of Per Capita Income vs. Bachelors
ggplot(data = countyComplete, 
       aes(x = cut(per_capita_income, breaks = 5), y = bachelors)) + 
  geom_boxplot()

Interpretation

There is a linear correlation in both scatterplots and boxplots between bachelors and per capita income. Most of the plots are situated between 10,000 to 30,000 in per capita income. However, we got some extreme values, but overall we can see a positive linear correlation where bachelors increases as per capita income increases.

1.3 Creating scatterplots

# Load the package
library(openintro)

# Body dimensions scatterplot
ggplot(data = countyComplete, aes(x = per_capita_income, y = bachelors, color = factor(rural))) +
  geom_point()

Chapter 2: Correlation

2.1 Computing correlation

The cor(x, y) function will compute the Pearson product-moment correlation between variables, x and y.

# Load the package
library(dplyr)

# Compute correlation
countyComplete %>%
  summarize(N = n(), r = cor(per_capita_income, bachelors))
##      N         r
## 1 3143 0.7924464

# Compute correlation for all non-missing pairs
countyComplete %>%
  summarize(N = n(), r = cor(per_capita_income, bachelors, use = "pairwise.complete.obs"))
##      N         r
## 1 3143 0.7924464

Interpretation

The computed correlation coefficient between per capita income and bachelors is 0.7924464. This indicates that the correlation is strong.

Quiz on Correlation

Erik Armskog

Chapter 1: Visualizing two variables

1.1 Scatterplots

1.2 Boxplots as discretized/conditioned scatterplots

1.3 Creating scatterplots

Chapter 2: Correlation

2.1 Computing correlation