7/10/2019

Introduction

  • Goal: use plotly to do plotting for a loan prediction dataset
  • The dataset comprises 13 columns on the loaners' background and the loan status
  • Load datasets and only show the first 3 rows
   Loan_ID Gender Married Dependents Education Self_Employed
1 LP001002   Male      No          0  Graduate            No
2 LP001003   Male     Yes          1  Graduate            No
3 LP001005   Male     Yes          0  Graduate           Yes
  ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term
1            5849                 0         NA              360
2            4583              1508        128              360
3            3000                 0         66              360
  Credit_History Property_Area Loan_Status
1              1         Urban           Y
2              1         Rural           N
3              1         Urban           Y

Show some histograms

  • Here I am going to use plotly to show some histograms about the distribution of the applicant income

Show some boxplots

  • Boxplots may be easier for me to showcase the relationship between the distribution of ApplicantIncome vs LoanStatus

Show more boxplots

  • What about Loan Amount?

Show more boxplots

  • What about Coappicant Income?

Show barplots

  • Look into the credit history. Plot the barplots about how many have good credit history and how many have failed credit scores.

Show more barplots

  • Let us look at the property area with barplots:

Show more barplots

  • Let us look at the gender with barplots:

Show more barplots

  • Let us look at the marriage status with barplots:

Show more barplots

  • Let us look at the education level with barplots:

Show more barplots

  • Let us look at the self employment with barplots:

Show more barplots

  • Let us look at the education level with barplots:

Summary

  • I basically did some simple exploratory data analysis with the training datasets for loan predictio using plotly
  • For continuous variables, I basically used histograms and barplots
  • For categorical variables, I used barplots for the frequencies