R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Load Libraries

library(ggplot2)

Creating dataset

set.seed(0)

# Create the dataset
customer_data <- data.frame(
  age = sample(18:80, 500, replace = TRUE),
  income = sample(20000:120000, 500, replace = TRUE),
  marital_status = sample(c("Single", "Married", "Divorced", "Widowed"), 500, replace = TRUE),
  health_insurance = sample(c(0, 1), 500, replace = TRUE),
  housing_type = sample(c("Rented", "Owned", "Mortgaged", "Free"), 500, replace = TRUE)
)

head(customer_data)
##   age income marital_status health_insurance housing_type
## 1  31  75676       Divorced                1    Mortgaged
## 2  74  50612       Divorced                1        Owned
## 3  21  67694        Widowed                0    Mortgaged
## 4  56  32283        Widowed                0        Owned
## 5  18  55519        Married                1    Mortgaged
## 6  51  66463       Divorced                0       Rented

Question 1 : What is the distribution of customer ages?

#write your code here
summary(customer_data$age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.00   37.00   52.00   50.14   63.00   80.00
ggplot(customer_data, aes(x=age)) + geom_density()

Question 2 : How is income distributed among the customer population? Also comment on the modality.

#The income distribution for this data set is multi-modal with 2 significant peaks corresponding with incomes ranging from 25k to 50k and 75k to 100k. 
summary(customer_data$income)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20132   46050   73438   71393   94383  119962
ggplot(customer_data, aes(x=income)) + geom_density()

Question 3 : Is there a relationship between age and income?

#write your code here
customer_data2 <- subset(customer_data, 
                         0 < age & age < 100 & 0 < income & income < 200000)
cor(customer_data2$age, customer_data$income)
## [1] -0.04395015
ggplot(customer_data, aes(x=age, y=income)) +
  geom_point() + geom_smooth() + 
  ggtitle("Income as a Function of Age")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Question 4 :What is the distribution of marital status in the customer base?

#write your code here
ggplot(customer_data, aes(x=marital_status, fill="red")) +
  geom_bar()

Question 5 :How does health insurance coverage vary across marital statuses?

#Not Finished - aesthetic issue.
ggplot(customer_data, aes(x=marital_status, fill=)) +
  geom_bar()

ggplot(customer_data, aes(x=marital_status, fill=health_insurance)) +
  geom_bar(position = "dodge")
## Warning: The following aesthetics were dropped during statistical transformation: fill.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?