EDA
Row
Mosaic (Martial Status X Credit Risk X Gender)

Feature Selection
Column
PCA

χ-squared

Conclusion
Post Mortem
Insights & Further Considerations
- The majority of customers in this data set are male, single, own a
house, and have do ‘skilled’ work.
- The data is not representative for female customers
- Loan Purpose, Gender, Marital Status, and Housing had significant
relationships with Credit Risk. The relationship between Job &
Credit Risk was not significiant.
- 88.8% of variance is explained in PC1 through PC5. However, removing
‘Years’ (which is primarily in PC6) reduced the accuracy of every model
by 1.0-2.5%
- The most accurate model was RF(n = 850)*
What went well
- I was able to start learning the basics of building interactive
plots (plotly) and dashboards (flex dashboard) in R, and sharing R
files
- The R community & open-source nature of the language was
incredibly helpful
What didn’t
- Needed to spend more time in the ‘80%’ of data analysis
- Naming scheme interfered with dashboard render
- R provided an all-in-one package, but executing the overall
dashboard was difficult. Other intricacies were problematic such as
libraries & functions
- Building Marimekko/Mosaic Plots in plotly & ggplot
- Program stability
Future plans
- Define a more clear-cut goal
- Feature engineering: Dummy coding/one hot encoding categorical
variables, creating categories for Age, Checking, & Savings, and
combining
- Fix current code
- Testing the model with additional data
- Create dashboard slides that work together