library(openintro)
library(ggplot2)
library(dplyr)

# Load data
data(countyComplete) # It comes from the openintro package

# Create a new variable, rural
countyComplete$rural <- ifelse(countyComplete$density < 500, "rural", "urban")
countyComplete$rural <- factor(countyComplete$rural)

Identify what variables are strongly associated with the standard of living. To that end, do the following:

Create a scatterplot of per_capita_income and bachelors in the data set (1.1 Scatterplots).

# Load packages
library(ggplot2)


ggplot(data = countyComplete, aes(x = per_capita_income, y = bachelors)) + geom_point()

Interpretation: positive linear relationship between per capita income and bachelors

Create a boxplot of per_capita_income and bachelors (1.2 Boxplots as discretized/conditioned scatterplots).

1.2 Boxplots as discretized/conditioned scatterplots

# Boxplot of weight vs. weeks
ggplot(data = countyComplete, 
       aes(x = cut(per_capita_income, breaks = 5), y = bachelors)) + 
  geom_boxplot()

Interpretation: those with bachelors degrees are above the average per capita income

Add the rural variable to the scatterplot of per_capita_income and bachelors to see whether there is any difference in their relationship between rural and urban (1.3 Creating scatterplots).

1.3 Creating scatterplots

# Load the package
library(openintro)

# Body dimensions scatterplot
ggplot(data = countyComplete, aes(x = per_capita_income, y = bachelors, color = factor(rural))) +
  geom_point()

Interpretation: between urban and rural, people with a bachelors earn more in an urban setting

Compute correlation coefficient between income_per_capita and bachelors (2.1 Computing correlation). Interpret it. Keep in mind that correlation coefficients do not show causation but only association.

2.1 Computing correlation

The cor(x, y) function will compute the Pearson product-moment correlation between variables, x and y.

# Load the package
library(dplyr)

# Compute correlation
countyComplete %>%
  summarize(N = n(), r = cor(per_capita_income, bachelors))
##      N         r
## 1 3143 0.7924464

# Compute correlation for all non-missing pairs
countyComplete %>%
  summarize(N = n(), r = cor(per_capita_income, bachelors, use = "pairwise.complete.obs"))
##      N         r
## 1 3143 0.7924464