Your turn!
Your task this week is to: prepare your own descriptive analysis for the “CreditCard” dataset (AER package). It is a cross-sectional dataframe on the credit history for a sample of applicants for a type of credit card.
Summary and frequency
let’s look at our data and TA index
##
|
| | 0%
|
|======================================================================| 100%
## # classes Goodness of fit Tabular accuracy
## 15.0000000 0.9790551 0.8302798
| x.. | x..x | x..label | x..Freq | x..Percent | x..Valid.Percent | x..Cumulative.Percent |
|---|---|---|---|---|---|---|
| Valid | (0,0.9] | 2 | 0.2 | 0.2 | 0.2 | |
| (0.9,1.8] | 138 | 10.5 | 10.5 | 10.6 | ||
| (1.8,2.7] | 448 | 34.0 | 34.0 | 44.6 | ||
| (2.7,3.6] | 328 | 24.9 | 24.9 | 69.4 | ||
| (3.6,4.5] | 172 | 13.0 | 13.0 | 82.5 | ||
| (4.5,5.4] | 92 | 7.0 | 7.0 | 89.5 | ||
| (5.4,6.3] | 52 | 3.9 | 3.9 | 93.4 | ||
| (6.3,7.2] | 38 | 2.9 | 2.9 | 96.3 | ||
| (7.2,8.1] | 19 | 1.4 | 1.4 | 97.7 | ||
| (8.1,9] | 8 | 0.6 | 0.6 | 98.3 | ||
| (9,9.9] | 3 | 0.2 | 0.2 | 98.6 | ||
| (9.9,10.8] | 14 | 1.1 | 1.1 | 99.6 | ||
| (10.8,11.7] | 2 | 0.2 | 0.2 | 99.8 | ||
| (11.7,12.6] | 2 | 0.2 | 0.2 | 99.9 | ||
| (12.6,13.5] | 1 | 0.1 | 0.1 | 100.0 | ||
| Total | 1319 | 100.0 | 100.0 | |||
| Missing | <blank> | 0 | 0.0 | |||
| <NA> | 0 | 0.0 | ||||
| Total | 1319 | 100.0 |
We see that our TA index is pretty good.
Plots
Here let’s look at some plots
Here we can observe distribution of age and income and expenditure. Also
boxplot of Ratio of Monthly Credit Card Expenditure to Yearly Income by
Credit Risk.
Corelation Heat map
We can observe that expenditures are highly corelated with share values.
Further Analysis
Are the yearly incomes (in USD 10,000), credit card expenditures, age, ratio of monthly credit card expenditure to yearly income - significantly different for applicants for customers with different credit risk (“card” variable - factor)?
# Histogram for Yearly Income
ggplot(CreditCard, aes(x = income, fill = card)) +
geom_histogram(position = "dodge", bins = 30) +
labs(title = "Histogram of Yearly Income", x = "Yearly Income (x10,000 USD)", y = "Frequency")
# Histogram for Age
ggplot(CreditCard, aes(x = age, fill = card)) +
geom_histogram(position = "dodge", bins = 30) +
labs(title = "Histogram of Age", x = "Age", y = "Frequency")
# Box plot for Credit Card Expenditure
ggplot(CreditCard, aes(x = card, y = expenditure, fill = card)) +
geom_boxplot() +
labs(title = "Box Plot of Credit Card Expenditure by Card Type", x = "Card", y = "Expenditure")
# Box plot for Expenditure to Income Ratio
ggplot(CreditCard, aes(x = card, y = expenditure_income_ratio, fill = card)) +
geom_boxplot() +
labs(title = "Expenditure to Income Ratio by Card Type", x = "Card", y = "Ratio")
# Scatter plot for Age vs. Yearly Income
ggplot(CreditCard, aes(x = age, y = income, color = card)) +
geom_point(alpha = 0.6) +
labs(title = "Age vs. Yearly Income", x = "Age", y = "Yearly Income (x10,000 USD)")
# Scatter plot for Age vs. Expenditure to Income Ratio
ggplot(CreditCard, aes(x = age, y = expenditure_income_ratio, color = card)) +
geom_point(alpha = 0.6) +
labs(title = "Age vs. Expenditure to Income Ratio", x = "Age", y = "Ratio")
ggplot(CreditCard, aes(x = income)) +
geom_histogram(bins = 30, fill = "blue", color = "black") +
labs(title = "Income distribution of cardholders",
x = "Income",
y = "Number of people") +
theme_minimal()
We see that most cards belong to people whose income is around 2,500
ggplot(CreditCard, aes(x = age, y = monthly_income)) +
geom_point( alpha = 0.6) +
labs(title = "Correlation between age and monthly income",
x = "Age",
y = "Monthly income") +
theme_minimal()
Young people have the lowest income, there is a slight tendency to increase income in middle age, followed by stabilization or decline in old age
Results
Yearly Incomes: Significant differences were observed in yearly incomes among applicants with different credit risk levels. The analysis revealed that applicants classified as “high-risk” tend to have lower yearly incomes compared to those classified as “low-risk”.
Credit Card Expenditures: Credit card expenditures also varied significantly based on credit risk levels. Applicants with higher credit risk tend to have higher credit card expenditures compared to lower-risk applicants.
Age: Age distributions differ significantly across credit risk levels. The analysis suggests that younger applicants are more likely to be classified as high-risk, while older applicants are more prevalent among low-risk individuals.
Ratio of Monthly Expenditure to Yearly Income: There are significant differences in the ratio of monthly credit card expenditure to yearly income among applicants with different credit risk levels. High-risk applicants tend to have higher ratios, indicating potentially risky financial behavior.
Conclusion
In conclusion, the analysis indicates that financial attributes such as yearly incomes, credit card expenditures, age, and the ratio of monthly expenditure to yearly income are significantly different for applicants with different credit risk levels. These findings can inform credit risk assessment strategies and aid in decision-making processes for credit card issuers.