Introduction

Prosper is America’s first marketplace lending platform, with over $10 billion in funded loans.

Prosper allows people to invest in each other in a way that is financially and socially rewarding. On loans, borrowers list loan requests between $2,000 and $35,000 and individual investors invest as little as $25 in each loan listing they select. Prosper handles the servicing of the loan on behalf of the matched borrowers and investors.

Prosper Funding LLC is a wholly-owned subsidiary of Prosper Marketplace, Inc.

Prosper Marketplace is backed by leading investors including Sequoia Capital, Francisco Partners, Institutional Venture Partners, and Credit Suisse NEXT Fund.

This Exploratory Data Analysis scope covers loan information for over a 100,000 people between the years 2006 and 2013.

The dataset has 81 original variables in the dataset.

This project is divided into 3 analytical segments which are Univariate Plots, Bivariate Plots, and Multivariate Plots, as well as a Reflection segment at the end that summarizes my experience and thoughts throughout this course.

Loading the Dataset and Setting Global Options

Set working directory

Getting and loading data

Prosper Loan data can be downloaded from this location: https://s3.amazonaws.com/udacity-hosted-downloads/ud651/ProsperLoanData.csv

Dataset Structure

Let’s take a look at the structure and contents of the dataset:

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

The structure contains lots of information. Let’s take a look at the columns (variables) names.

Column names

##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"

There are 81 variables in the data set.

Top three record from the dataset

##                ListingKey ListingNumber           ListingCreationDate
## 1 1021339766868145413AB3B        193129 2007-08-26 19:09:29.263000000
## 2 10273602499503308B223C1       1209647 2014-02-27 08:28:07.900000000
## 3 0EE9337825851032864889A         81716 2007-01-05 15:00:47.090000000
##   CreditGrade Term LoanStatus          ClosedDate BorrowerAPR BorrowerRate
## 1           C   36  Completed 2009-08-14 00:00:00     0.16516        0.158
## 2               36    Current                         0.12016        0.092
## 3          HR   36  Completed 2009-12-17 00:00:00     0.28269        0.275
##   LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 1       0.138                      NA            NA              NA
## 2       0.082                  0.0796        0.0249          0.0547
## 3       0.240                      NA            NA              NA
##   ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 1                      NA                                 NA
## 2                       6                     A            7
## 3                      NA                                 NA
##   ListingCategory..numeric. BorrowerState   Occupation EmploymentStatus
## 1                         0            CO        Other    Self-employed
## 2                         2            CO Professional         Employed
## 3                         0            GA        Other    Not available
##   EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 1                        2                True             True
## 2                       44               False            False
## 3                       NA               False             True
##                  GroupKey              DateCreditPulled
## 1                         2007-08-26 18:41:46.780000000
## 2                                   2014-02-27 08:28:14
## 3 783C3371218786870A73D20 2007-01-02 14:09:10.060000000
##   CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
## 1                   640                   659     2001-10-11 00:00:00
## 2                   680                   699     1996-03-18 00:00:00
## 3                   480                   499     2002-07-27 00:00:00
##   CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
## 1                  5               4                         12
## 2                 14              14                         29
## 3                 NA              NA                          3
##   OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
## 1                     1                          24                    3
## 2                    13                         389                    3
## 3                     0                           0                    0
##   TotalInquiries CurrentDelinquencies AmountDelinquent
## 1              3                    2              472
## 2              5                    0                0
## 3              1                    1               NA
##   DelinquenciesLast7Years PublicRecordsLast10Years
## 1                       4                        0
## 2                       0                        1
## 3                       0                        0
##   PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
## 1                         0                      0                0.00
## 2                         0                   3989                0.21
## 3                        NA                     NA                  NA
##   AvailableBankcardCredit TotalTrades TradesNeverDelinquent..percentage.
## 1                    1500          11                               0.81
## 2                   10266          29                               1.00
## 3                      NA          NA                                 NA
##   TradesOpenedLast6Months DebtToIncomeRatio    IncomeRange
## 1                       0              0.17 $25,000-49,999
## 2                       2              0.18 $50,000-74,999
## 3                      NA              0.06  Not displayed
##   IncomeVerifiable StatedMonthlyIncome                 LoanKey
## 1             True            3083.333 E33A3400205839220442E84
## 2             True            6125.000 9E3B37071505919926B1D82
## 3             True            2083.333 6954337960046817851BCB2
##   TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 1                NA                         NA                    NA
## 2                NA                         NA                    NA
## 3                NA                         NA                    NA
##   ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 1                                  NA                              NA
## 2                                  NA                              NA
## 3                                  NA                              NA
##   ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 1                       NA                          NA
## 2                       NA                          NA
## 3                       NA                          NA
##   ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 1                          NA                         0
## 2                          NA                         0
## 3                          NA                         0
##   LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 1                            NA                         78      19141
## 2                            NA                          0     134815
## 3                            NA                         86       6466
##   LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 1               9425 2007-09-12 00:00:00                Q3 2007
## 2              10000 2014-03-03 00:00:00                Q1 2014
## 3               3001 2007-01-17 00:00:00                Q1 2007
##                 MemberKey MonthlyLoanPayment LP_CustomerPayments
## 1 1F3E3376408759268057EDA             330.43            11396.14
## 2 1D13370546739025387B2F4             318.93                0.00
## 3 5F7033715035555618FA612             123.32             4186.63
##   LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 1                         9425            1971.14        -133.18
## 2                            0               0.00           0.00
## 3                         3001            1185.63         -24.20
##   LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 1                 0                     0                   0
## 2                 0                     0                   0
## 3                 0                     0                   0
##   LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 1                               0             1               0
## 2                               0             1               0
## 3                               0             1               0
##   InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 1                          0                           0       258
## 2                          0                           0         1
## 3                          0                           0        41

Summary of the dataset

##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6          :84984   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   C      : 5649   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   D      : 5153   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   B      : 4389   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   AA     : 3509   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   HR     : 3508   Max.   :60.00  
##  (Other)                      :113912   (Other): 6745                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576                      :58848  
##  Completed            :38074   2014-03-04 00:00:00:  105  
##  Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other)              : 1108   (Other)            :54633  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

Dimension of the dataset

## [1] 113937     81

There are 81 columns (variables) and 113,937 rows (loan takers) in Prosper Loan Data.

Install and Load Necessary Libraries

Cleaning Data

Convert CreditScoreRangeLower and CreditScoreRangeUpper into a single CreditScore value (calculate average of the two variables).

Convert ListingCategory from numeric to factor variable using the keys given in the Google Spreadsheet.

##              Debt Consolidation                Home Improvement 
##                           58308                            7433 
##                        Business      Personal\n            Loan 
##                            7189                            2395 
##                     Student Use                            Auto 
##                             756                            2572 
##                 Baby & Adoption                            Boat 
##                             199                              85 
## Cosmetic\n            Procedure                 Engagement Ring 
##                              91                             217 
##                      Green Loan              Household Expenses 
##                              59                            1996 
##    Large\n            Purchases                  Medical/Dental 
##                             876                            1522 
##                      Motorcycle                              RV 
##                             304                              52 
##                           Taxes                        Vacation 
##                             885                             768 
##                         Wedding                           Other 
##                             771                           10494 
##                  Not Applicable 
##                           16965

Convert dates to date class using lubridate’s ymd_hms() function

Convert LoanOriginationQuarter to begin with the year using tidyr to make sure that any plot axis will put it in increasing order of year

CreditGrade was used to store credit rating pre-2009 while ProsperRating..Alpha. was used to store credit rating post-2009. I combined the two variables into one. They are not exactly the same thing but pretty close to make meaningful sense and make data plots easier.

Let’s look at the structure of newly created CreditRating

##  Ord.factor w/ 7 levels "AA"<"A"<"B"<"C"<..: 4 1 7 1 5 3 6 4 2 2 ...

Format Date Columns As.Date

Split date variables by day, month and year to enable easy analysis of dates

Merge the Split Dates with Prosper Loan Data

View New Prosper Loan Data

Modified Dataset

New Columns

##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"                          
## [82] "CreditScore"                        
## [83] "ListingCategory"                    
## [84] "LoanOriginationQuarterF"            
## [85] "CreditRating"                       
## [86] "ListingYear"                        
## [87] "ListingMonth"                       
## [88] "ListingDay"                         
## [89] "OriginationYear"                    
## [90] "OriginationMonth"                   
## [91] "OriginationDay"                     
## [92] "ClosedYear"                         
## [93] "ClosedMonth"                        
## [94] "ClosedDay"

Dimension of new dataset

## [1] 113937     94

Prosper Loan Data now has 94 variables comprising of the original 81 variables and 13 newly created/modified variables

Univariate Plots

Let’s plot some of the variables and see what information they have.

The first variable I will like to look at is LoanOriginalAmount

Bulk of the loans given were below 15,000. Also, most loans were in multiples of $5,000 as seen in number of loans at $5,000, $10,000, $15,000, $20,000, $25,000, $30,000 and $35,000 points. It could also support the fact that loans were not granted based on values of underlying assets but mostly to refinance existing debts. We will see more as I plot reasons why loans were taken.

Plot of ListingYear

Most loans in the dataset were booked in 2013 followed by 2012. The lowest number of loans were booked in 2009.Let’s look at the percentage distribution.

## Percentage Distribution of Listing Year
##         2005         2006         2007         2008         2009 
## 0.0002018659 0.0545301351 0.1014332482 0.0988528748 0.0193615770 
##         2010         2011         2012         2013         2014 
## 0.0485355942 0.1004239185 0.1716387126 0.3108121155 0.0942099581

31% of all loans were listed in 2013. There was a big dip in 2009 before loans started picking up from 2010 which peaked in 2013

Plot of ListingCreationMonth

Loans listed in January are more than in other months. Listings were lowest in March and April and rose steadily through December.

Plot of ListingCreationDate

Listing were lowest on 31st day of the month.

Plot of Loan Origination Year

More loans were originated in 2013 than any other year. Let’s take a look at the percentage distribution.

## Percentage Distribution of Origination Year
##         2005         2006         2007         2008         2009 
## 0.0001930892 0.0518356636 0.1005819005 0.1013893643 0.0179660690 
##         2010         2011         2012         2013         2014 
## 0.0496063614 0.0985456875 0.1716123823 0.3014385143 0.1068309680

30% of all loans were originated in 2013 followed by 17% in 2012. Only about 1.8% originated in 2009. It will be interesting to look at what happened to Prosper Loan in 2009.

Plot of Loan Origination Month

Loans originated in January were more than in other months with a dip in March and April.

More loans were also originated on the 30th of the month. This could be due to employees attempt to meet performance requirement for the month.

Plot of Loan Closed Year

It makes sense that more loans would be closed in 2013 than any other years following the fact that more loans were originated in 2013 as well.

Plot of Loan Closed Month

More loans were closed on the 2nd of the month than other days.

Plot of Loans Closed Day

More loans closed on the 30th of the month also.

Let take a step further and view Listing Year, Origination Year and Closed Year together

Plots and Grid Arrangement of Origination, Listing and Closed Year

Most loans in the data were listed, originated and were closed in 2013 than any other years.

Plot of Borrowers Occupation

Below is the visualize of borrowers’ occupation:

‘Other’ and ‘Professional’ stood out in the occupation chart. This shows that many borrowers decided to choose “other” and “professional” instead of their real occupation.

Plot of Borrowers States

California has the largest number of borrowers. This could be because Prosper Loan was founded and local to CA and being the state with one of the highest state debt per capital. The other popular states include Florida, New York, Texas and Illinois.

Next let’s look at some financial information:

Plot of Borrowers IncomeRange

This plot approximate Normal distribution with borrowers with zero income and not employed constituting the lowest number of borrowers as expected. I do not think loans were given to $0 income earner or employed. This could be borrowers that lost their jobs after getting the loans or have loans secured by other assets.

Plot of CreditScore

Median credit score was 700

Plot of ProsperRating…Alpha

This grid shows ‘CreditGrade’ which is pre-2009 rating, and ‘ProsperRating’ which post-2009 rating side by side. One thing that stood out is that both plots is that there are less borrowers in the both ends of the credit rating. This is because most people with super good credit are financially stable and do not usually take loans while people at the tail-end of the credit rating don’t always get approved for credits. To me, that makes sense.

Plot of Borrowers APR

The bulk of the loans seem to be near the 0.2 mark, which coincides with the credit rating histograms that show that majority of the users are in the middle of the risk ratings. There is a strange spike in the 0.35-0.37 bin which indicates a strangely popular fee rate for primarily higher risk borrowers.

Plot of Lender Yield

The lender yield plot is similar to borrower APR because they’re two sides of the same coin. I do notice, however, that the peak count is slightly lower than the one in the borrower APR plot, and I presume it is because of the losses that are made when borrowers default loans or get charged off on late payments.

Plot of DebtToIncomeRatio

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

Most of the borrowers have a debt-to-income ratio below 0.8 with the median being 0.22

Most of the borrowers have 0 current delinquencies which is a good thing.

Plot of LoanStatus

Most loans seem to still be ongoing, which is indicative of their booming growth.

Debt Consolidation seems by and far the most popular choice, with the rest of the non-ambiguous (or ’Not Applicable) occupation much below the 10,000 mark.

Time Series Plots

I’m going to do some Time Series plots to see the performance of loans since it launched and show any interesting trends in Prosper’s loans business.

Let’s take a look at the summary of the ‘LoanStatus’ variable to see what the grouping/coloring will look like:

##              Cancelled             Chargedoff              Completed 
##                      5                  11992                  38074 
##                Current              Defaulted FinalPaymentInProgress 
##                  56576                   5018                    205 
##   Past Due (>120 days)   Past Due (1-15 days)  Past Due (16-30 days) 
##                     16                    806                    265 
##  Past Due (31-60 days)  Past Due (61-90 days) Past Due (91-120 days) 
##                    363                    313                    304

Plot of LoanOriginationQuarter

This plot provides us with information on the growth of the company through it’s increasing loan openings. We can also see the performance of loans over time.

This plot shows the same time series data from the plot above, but with a line instead of a stacked bar chart.

Bivariate Plots

Plot of CreditScore vs ProsperRating..Alpha

The boxplots above show the relationship between borrower’s Prosper rating (note - this is only post-2009 data) and their credit score, and the variation in each rating category. A person’s credit score is one of the key factors in determining their Prosper Rating, so it’s no surprise that as we climb the rating categories, the credit score of the borrowers also tend to increase. ‘HR’ has a slightly higher median and IQR than ‘E’ despite being a riskier category - in fact, it’s on par with borrowers with a ‘D’ rating.

Plot of BorrowerAPR vs ProsperRating…Alpha

The boxplots above show the relationship between borrower’s Prosper rating and their assigned Annual Percentage Rate (APR). It’s very clear that as we go down the ladder of risk - from a ‘High Risk’ to an ‘AA’ rating - the APR for the borrower reduces drastically. In fact, looking at the results of a by() function, it goes from a median APR of 35.8% for High Risk all the way to a median value of 9% for ‘AA’.

The variation in APRs also decreases as the loans get less riskier as displayed by the decreasing size of the boxes in the boxplots when going from ‘HR’ to ‘AA’. There is also a reduction in the number of outliers, which is visible by the shortening lines of yellow rings.

Plot of Quarter vs LoanStatusGroup

The line graph above shows the number of loans that were defaulted over time. This is important for Prosper because they can see how frequently bad loans are made, and more importantly, to judge whether any policies - like the minimum credit score - are improving the likelihood of payment.

The 2 times the line veers below the 200 mark are misleading. The first is because of the ‘quiet’ period mentioned before, and therefore expected. The second one, however, in 2013, is because most of the loans in that period are with the ‘Current’ or ‘FinalPaymentInProgress’ status, and there just hasn’t been enough time to know whether loans are ‘Completed’ or ‘Defaulted’. Over time, that line should go higher.

In fact, a better way to show defaulted loans would be to show the rate at which loans are being defaulted. Let’s do that next:

This looks more systematic. I can see that between 2006 and 2008, the default rate hovered between 30%-40% - a pretty high rate. Once again, we see that drop, but it does not go to zero this time, and we can now see that during that period (which also happens to be after the financial crisis) there were borrowers unable to pay back their loans.

Plot of DelinquenciesLast7Years vs AmountDelinquent

The over plotting and general dispersion of data doesn’t really reveal much trend in this plot.

Multivariate Plots

Plot of DebtToIncomeRatio vs BorrowerAPR

This is a great plot with a lot of information. Here we have a scatter plot of borrower’s APR and the debt to income ratio of the borrower, with the colors describing the risk category given to the particular loan. I’ve given the legend a continuous color scale despite it being discrete variables because it displays the progression from a safe green to a risky red. I’ve also decided to include all points in the y-axis (including outliers) to show the range of rates, and I’ve limited the x-axis by removing 0.05% of the points furthest away from the median (i.e. removing outliers that spread the graph).

The first thing I noticed and found interesting was that ‘A’ category loans seem to have a lower APRs and a smaller range of debt-to-income ratios, both of which indicate less risk. The rest of the plot follow the color palette and APR increases as the rating gets riskier. Another thing is that most people tend to have debt-to-income ratios below 1, regardless of risk category. Also, there is this unusual horizontal line in the ‘HR’ category that extends past 1 and all the way to 1.5, while lower ratings tend to be sparse in the 1.0+ debt-to-income ratio range.

Plot of LoanOriginalAmount vs LenderYield

This plot shows the relationship between a lender yield on the loans and the amount that a borrower has loaned. I then made individual graphs to show that relationship based on the status of the loan - Defaulted, Past Due, Current and Completed - and finally colored it based on risk rating.

Plot of CurrentDelinquencies vs BorrowerAPR

This plot was made to see if there were any distinct differences in terms of completing and defaulting loans when it came to current delinquencies. Unfortunately, there doesn’t seem to be any tell-tale signs and both plots look pretty similar. However, I do notice that higher rated loans seem less diverse in terms of delinquencies and APR, and customarily lumped in the bottom left corner. As the loans gets riskier, the points get more varied and diverse, and tend to be all over the graph.

Final Plots

Plot One: Quarter vs LoanStatusGroup

I have chosen this plot because of its combination of detail and simplicity. It makes for an easy way to evaluate the performance of Prosper loans. I’m going to compare it as pre-2009 and post-2009, because 2009 was when they went into a ‘quiet’ period and changed their business model and also mandated a minimum credit score of 640. This plot is one way of visually seeing whether their changes have resulted in a more Prosperous lending platform.

First let’s look at pre-2009. They were still a young company at the time, and we can see that with the sub-5000 loans per quarter figure. More importantly, though, all the loans originated at the time are either completed or defaulted (i.e. none are still ongoing). Now I can compare the relative sizes of the red and green bars, and I can easily tell that approximately half, or a bit less than half the loans that were granted, defaulted.

That’s not good, especially when they have to convince investors that they’re making solid investments. Now let’s look at post-2009. Right when they restarted servicing loans, for about the next year, we can see that the size of the red bar is much smaller relative to the green bar. That tells me that their minimum credit policy seems like it’s working - defaults look pretty low in number.

I chose to look at only the next year because there seem to be no, or an insignificant amount of loans still currently active. After that - 2011 onwards - we see the number of loans being originated rise tremendously. The red bar to green bar ratio seems to be increasing slowly, but it’s difficult to tell with the growing ‘Current’ blue bar in between. But a few questions arise - how many of those current loans will end up in the green, completed group in the future? How many will instead dive into the yellow of ‘Past Due’? And out of those, how many will make it out and enter green, and how many will fall into the dreadful red of ‘Defaulted’?

It’s difficult to say whether their new policies have improved investment quality - the first year certainly suggests so, but scaling up quickly invariably leads to some new problems. Only time can tell what color the blues in the plot will convert to.

Plot Two: Quarter vs LoanStatusGroup

I’ve chosen this plot because it’s clear-cut and delivers its message precisely. The graph shows the default rate (in percentage) of loans over the years. In a way, it continues from the Plot One above, accurately showing the default rates instead of approximating from a colored bar. And I can validate some of the estimations I made earlier.

Pre-2009, we can see that the rate generally hung around the 30-40% mark, which is considerably high and a definite area for improvement. 2008 Quarter 4 was when Prosper stopped making new loans, which lasted until 2009 Quarter 3. That might be the reason for the steep drop during that period. However, the interesting thing is that they continued performing at that low default rate for the entirety of 2010, when they restarted their service with new policies.

During 2011 it went back up to around 30% and stayed in that region up until 2012. It is important to point out that 2011 onwards, there are still loans that are still currently running, and we cannot make conclusions based on the data. Especially 2013 onwards, where most of the loans are still ongoing, and very few loans are either completed, defaulted, or past due.

Hence, that downward spike occurring around the end of 2012 until 2014 is quite misleading. However, we can say that default rates for complete data (pre-2011) has improved, as they are no longer touching 40%. The hovering around 30% is still quite unfavorable, though, and I’m sure Prosper would like to see that number drop over the coming years.

Plot Three: LoanOriginalAmount vs LenderYield vs CreditRating

This plot was chosen to answer some questions on Prosper Loans. It shows that defaulted category doesn’t have loans $25,000 and below. There are not many loans above $25,000. In the Current category, however, I see much more loans being taken past the $25,000 mark and even past the $30,000 mark and veering towards the maximum of $35k.

The Current category shows a neat ordered rows of colors. It was noted that ‘A’ rating has lower yields. This is understandable as borrowers with good credit rating will have lower APR which return lower yields. Could this be one of the reasons why lenders love to book subprime credits? That a $1m question. Also, borrowers with a riskier rating got approved for lower loan amounts, which makes sense considering the fact that they may be under tremendous financial pressure.

Reflection

This was somehow challenging but a good way to have a firm grasp on doing exploratory data analysis with R. much more difficult than I thought it would be, but in the end, it was incredibly rewarding. Before this project, I have done some EDA work on “marriage age”, death by gun violence, etc. but with this project I was able to touch many knowledge areas of EDA.

Prosper is a peer-to-peer lending with main business rival being Lending Club. Prior to this project work, I have not heard about Prosper. I explored Prosper data through the eyes of one of the three main stakeholders: borrowers, investors, and the company itself.

Over plotting was one of the challenges encountered for which coloring was used to address.

This project made me to appreciate wonderful visualizations and a good way to understand the underlining information in a dataset. Th experience gained with this dataset definitely improved my EDA skills.

There are a number of different ways to take this project further. Firstly, I’ve focused on a small subset of the variables available in the dataset, and there is a vast amount of data I’ve chosen not to explore. I think I’d like to explore the investors side a bit more; look at investor profit and losses and their general activity in the peer-to-peer lending industry. Also, I would like to learn about the kind of plots and graphs specifically used by the finance industry, so that I can incorporate that knowledge into any future datasets I may explore. I wish to apply lessons learned in the rest of this program and projects participation.