This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Load Libraries
library(ggplot2)
Creating dataset
set.seed(0)
# Create the dataset
customer_data <- data.frame(
age = sample(18:80, 500, replace = TRUE),
income = sample(20000:120000, 500, replace = TRUE),
marital_status = sample(c("Single", "Married", "Divorced", "Widowed"), 500, replace = TRUE),
health_insurance = sample(c(0, 1), 500, replace = TRUE),
housing_type = sample(c("Rented", "Owned", "Mortgaged", "Free"), 500, replace = TRUE)
)
head(customer_data)
## age income marital_status health_insurance housing_type
## 1 31 75676 Divorced 1 Mortgaged
## 2 74 50612 Divorced 1 Owned
## 3 21 67694 Widowed 0 Mortgaged
## 4 56 32283 Widowed 0 Owned
## 5 18 55519 Married 1 Mortgaged
## 6 51 66463 Divorced 0 Rented
Question 1 : What is the distribution of customer ages?
#write your code here
summary(customer_data$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 37.00 52.00 50.14 63.00 80.00
ggplot(customer_data, aes(x=age)) + geom_density()
Question 2 : How is income distributed among the customer population?
Also comment on the modality.
#The income distribution for this data set is multi-modal with 2 significant peaks corresponding with incomes ranging from 25k to 50k and 75k to 100k.
summary(customer_data$income)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20132 46050 73438 71393 94383 119962
ggplot(customer_data, aes(x=income)) + geom_density()
Question 3 : Is there a relationship between age and income?
#write your code here
customer_data2 <- subset(customer_data,
0 < age & age < 100 & 0 < income & income < 200000)
cor(customer_data2$age, customer_data$income)
## [1] -0.04395015
ggplot(customer_data, aes(x=age, y=income)) +
geom_point() + geom_smooth() +
ggtitle("Income as a Function of Age")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Question 4 :What is the distribution of marital status in the customer
base?
#write your code here
ggplot(customer_data, aes(x=marital_status, fill="red")) +
geom_bar()
Question 5 :How does health insurance coverage vary across marital
statuses?
#Not Finished - aesthetic issue.
ggplot(customer_data, aes(x=marital_status, fill=)) +
geom_bar()
ggplot(customer_data, aes(x=marital_status, fill=health_insurance)) +
geom_bar(position = "dodge")
## Warning: The following aesthetics were dropped during statistical transformation: fill.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?