Your task this week is to: prepare your own descriptive analysis for the “CreditCard” dataset (AER package). It is a cross-sectional dataframe on the credit history for a sample of applicants for a type of credit card.
Are the yearly incomes (in USD 10,000), credit card expenditures, age, ratio of monthly credit card expenditure to yearly income - significantly different for applicants for customers with different credit risk (“card” variable - factor)?
Prepare a professional data visualizations, descriptive statistics’ tables and interpret them.
selected_data <- CreditCard %>%
select(card, income, expenditure, age) %>%
mutate(expenditure_income_ratio = expenditure / (income * 10000 / 12))
(stats_by_rooms <- stby(data = selected_data, INDICES = selected_data$card, FUN = descr, stats = "common", transpose = TRUE))
## Non-numerical variable(s) ignored: card
## Descriptive Statistics
## selected_data
## Group: card = no
## N: 296
##
## Mean Std.Dev Min Median Max N.Valid Pct.Valid
## ------------------------------ ------- --------- ------ -------- ------- --------- -----------
## age 33.20 9.92 0.75 31.83 80.17 296.00 100.00
## expenditure 0.00 0.00 0.00 0.00 0.00 296.00 100.00
## expenditure_income_ratio 0.00 0.00 0.00 0.00 0.00 296.00 100.00
## income 3.07 1.62 0.49 2.59 11.00 296.00 100.00
##
## Group: card = yes
## N: 1023
##
## Mean Std.Dev Min Median Max N.Valid Pct.Valid
## ------------------------------ -------- --------- ------ -------- --------- --------- -----------
## age 33.22 10.21 0.17 31.08 83.50 1023.00 100.00
## expenditure 238.60 287.71 0.00 150.18 3099.50 1023.00 100.00
## expenditure_income_ratio 0.09 0.10 0.00 0.06 0.91 1023.00 100.00
## income 3.45 1.71 0.21 3.00 13.50 1023.00 100.00
library(ggplot2)
ggplot(CreditCard, aes(x = card, y = income, fill = card)) +
geom_boxplot() +
labs(x = "Card Status", y = "Yearly Income (USD 10,000)",
title = "Yearly Income by Card Status") +
theme_minimal()
ggplot(CreditCard, aes(x = income, fill = card)) +
geom_histogram(position = "dodge", bins = 30) +
labs(title = "Income Distribution by Card Status:",
x = "Yearly Income ($10,000)",
y = "Frequency",
fill = "Card Status")
Credit cardholders typically exhibit higher average incomes compared to individuals without credit cards.
Credit card expenditures, and ratio of monthly credit card expenditure to yearly income:
ggplot(CreditCard, aes(x = card, y = expenditure, fill = card)) +
geom_boxplot() +
labs(x = "Card Status", y = "Credit Card Expenditure",
title = "Credit Card Expenditure by Card Status") +
theme_minimal()
ggplot(CreditCard, aes(x = card, y = expenditure / (income * 10000 / 12), fill = card)) +
geom_boxplot() +
labs(x = "Card Status", y = "Ratio of Monthly Expenditure to Yearly Income",
title = "Ratio of Monthly Expenditure to Yearly Income by Card Status") +
theme_minimal()
ggplot(CreditCard, aes(x = expenditure / (income * 10000 / 12), fill = card)) +
geom_histogram(position = "dodge", bins = 30) +
labs(title = "Monthly Expenditure to Yearly Income Ratio by Card Status:",
x = "Expenditure to Income Ratio",
y = "Frequency",
fill = "Card Status")
Expenditures of all people who did not get accepted, are equal to 0, so is the ratio of expenditures to income.
ggplot(CreditCard, aes(x = card, y = age, fill = card)) +
geom_boxplot() +
labs(x = "Card Status", y = "Age",
title = "Age by Card Status") +
theme_minimal()
ggplot(CreditCard, aes(x = age, fill = card)) +
geom_histogram(position = "dodge", bins = 30) +
labs(title = "Age Distribution by Card Status:",
x = "Years",
y = "Frequency",
fill = "Card Status")
We found no significant difference in mean age between individuals with and without credit cards, suggesting age does not impact credit card ownership.