Peer-to-peer lending platforms have become important financial instruments to obtain high returns with a “controlled” level of risk. They offer the opportunity to lend money to different borrowers that are ranked according to their probability of default, i.e. borrowers that are unable to repay the debt. Online websites like Lending Club and Prosper are able to put in direct contact lenders and borrowers, matching the problem of demand and supply of loans, and avoiding all the transaction costs banks usually charge. For this reason, peer-to-peer lending is a very successful business model with an exponential growth in loan origination. Only to cite some numbers, the biggest platform on the market, i.e. Lending Club, has been able to lend roughly 5 billion dollars since its start in 2007. However, several risks are hidden in the process of peer-to-peer lending. Diversification is the key strategy in order to obtain high returns and avoid unnecessary hazards, as the default of big chunks of the loan. Also an important question should be addressed before investing, which strategy a lender should follow to optimize the expected returns for a given level of risk? The aim of this project is to find this strategy with methods borrowed from portfolio management and the use of machine learning techniques.

Let me consider the model used by Lending Club. Every loan grade (from A to G) is the result of a formula that takes into account the borrower’s credit score and also a combination of several indicators of credit risk from the borrower’s credit report and loan application. However, this formula is not public domain and a lender must rely on the information provided by the platform. Lending Club releases the historical data about loan statistics since 2007. Data can be freely downloaded at the url: https://www.lendingclub.com/info/download-data.action. A first analysis of the data from year 2007 to 2011, reveals that the average default rate varies from 5.8% for a grade A loan to an impressive 29.5% for a grade G loan (see Figure 1 and its caption for the methodology I used), providing important information for the diversification of a portfolio of loans.

plot of chunk unnamed-chunk-1

##       A       B       C       D       E       F       G 
## 0.05824 0.10960 0.15314 0.19101 0.21841 0.26660 0.29487

Figure 1. Ratio of good loans (fully paid and current) against bad loans that are expected to not pay back the money borrowed (defaulted and charged off). The table contains the numerical value of the ratio of bad loans for each risk group.

Besides, a large amount of information is hidden in the grades given by the platform. How can we know if it is convenient to invest in a A grade loan or a more riskier B grade loan? A standard measure of the performance of a portfolio of equities is the Sharpe Ratio (SR), which is defined as the ratio between the average portfolio return divided by its standard deviation. A similar index can be calculated for the loans in Lending Club data base. Figure 2 shows that the loan performance does not increase linearly for increasing category of default risk. These two preliminary plots demonstrate that the thorough analysis of the loan data base can give important insights for the realization of an algorithmic investing strategy that can be used by lenders of these online platforms. In the project I plan to classify loans in groups according to performance indexes by using machine learning methods, thus identifying the relevant features that characterize successful loans.

plot of chunk unnamed-chunk-2

Figure 2. Sharpe ratio for each risk group. The SR is calculated as the average of the returns of fully paid loans divided by their standard deviation.