rm(list=ls(all=T))
options(digits=4, scipen=12)
library(magrittr)

Introduction

議題:議題:使用貸款人的資料,預測他會不會還款



1 資料整理 Preparing the Dataset

1.1 基礎機率】What proportion of the loans in the dataset were not paid in full?

1.2 檢查缺項】Which of the following variables has at least one missing observation?

1.3 決定是否要補缺項】Which of the following is the best reason to fill in the missing values for these variables instead of removing observations with missing data?

1.4 補缺項工具】What best describes the process we just used to handle missing values?



2 建立模型 Prediction Models

2.1 顯著性】Which independent variables are significant in our model?

2.2 從回歸係數估計邊際效用】Consider two loan applications, which are identical other than the fact that the borrower in Application A has FICO credit score 700 while the borrower in Application B has FICO credit score 710. What is the value of Logit(A) - Logit(B)? What is the value of O(A)/O(B)?


【The Difference in Logits】

\[\begin{align} Logit(A)-Logit(B) &= log(O_A)-log(O_B) \\ &= f(X_A)-f(X_B) \\ &= b_{FICO}(\text{FICO}_A - \text{FICO}_B) \end{align}\]


【The Ratio of Odds】

\[ Exp(log(O_A) - log(O_B)) = \frac{Exp(log(O_A))}{Exp(log(O_B))} = \frac{O_A}{O_B}\]

# the difference of logits
# the ratio of odds

2.3 混淆矩陣、正確率 vs 底線機率】What is the accuracy of the logistic regression model? What is the accuracy of the baseline model?

# test accuracy
# baseline accuracy

2.4 ROC & AUC】Use the ROCR package to compute the test set AUC.

# test accuracy
# baseline accuracy



3 提高底線 Smart Baseline

3.1 高底線模型】The variable int.rate is highly significant in the bivariate model, but it is not significant at the 0.05 level in the model trained with all the independent variables. What is the most likely explanation for this difference?

3.2 高底線模型的預測值】What is the highest predicted probability of a loan not being paid in full on the testing set? With a logistic regression cutoff of 0.5, how many loans would be predicted as not being paid in full on the testing set?

3.3 高底線模型的辨識率】What is the test set AUC of the bivariate model?



4 預估投資獲利 Computing the Profitability of an Investment

4.1 投資價值的算法】How much does a $10 investment with an annual interest rate of 6% pay back after 3 years, using continuous compounding of interest?

4.2 投資獲利的算法,合約完成】While the investment has value c * exp(rt) dollars after collecting interest, the investor had to pay $c for the investment. What is the profit to the investor if the investment is paid back in full?

4.3 投資獲利的算法,違約】Now, consider the case where the investor made a $c investment, but it was not paid back in full. Assume, conservatively, that no money was received from the borrower (often a lender will receive some but not all of the value of the loan, making this a pessimistic assumption of how much is received). What is the profit to the investor in this scenario?

# c * exp(rt) - c correct



5 簡單投資策略 A Simple Investment Strategy

5.1 計算測試資料的實際投報率】What is the maximum profit of a $10 investment in any loan in the testing set?



6 面對不確定性的投資策略 An Investment Strategy Based on Risk

A simple investment strategy of equally investing in all the loans would yield profit $20.94 for a $100 investment. But this simple investment strategy does not leverage the prediction model we built earlier in this problem.

6.1 高利率、高風險】What is the average profit of a $1 investment in one of these high-interest loans (do not include the $ sign in your answer)? What proportion of the high-interest loans were not paid back in full?

6.2 高利率之中的低風險】What is the profit of the investor, who invested $1 in each of these 100 loans? How many of 100 selected loans were not paid back in full?



Q】利用我們建好的模型,你可以設計出比上述的方法獲利更高的投資方法嗎?請詳述你的作法?

#
#
#
#






LS0tDQp0aXRsZTogIkFTMy0zIFByZWRpY3RpbmcgTG9hbiBSZXBheW1lbnQiDQphdXRob3I6ICLljZPpm43nhLYgRDk5NDAxMDAwMSINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCmBgYHtyIGVjaG89VCwgbWVzc2FnZT1GLCBjYWNoZT1GLCB3YXJuaW5nPUZ9DQpybShsaXN0PWxzKGFsbD1UKSkNCm9wdGlvbnMoZGlnaXRzPTQsIHNjaXBlbj0xMikNCmxpYnJhcnkobWFncml0dHIpDQpgYGANCg0KLSAtIC0NCg0KIyMjIEludHJvZHVjdGlvbg0KDQoqKuitsOmhjO+8muitsOmhjO+8muS9v+eUqOiyuOasvuS6uueahOizh+aWme+8jOmgkOa4rOS7luacg+S4jeacg+mChOasvioqDQoNCjxicj4NCg0KLSAtIC0NCg0KIyMjIyAxIOizh+aWmeaVtOeQhiBQcmVwYXJpbmcgdGhlIERhdGFzZXQNCg0K44CQKioxLjEg5Z+656SO5qmf546HKirjgJFXaGF0IHByb3BvcnRpb24gb2YgdGhlIGxvYW5zIGluIHRoZSBkYXRhc2V0IHdlcmUgbm90IHBhaWQgaW4gZnVsbD8NCmBgYHtyfQ0KDQpgYGANCg0K44CQKioxLjIg5qqi5p+l57y66aCFKirjgJFXaGljaCBvZiB0aGUgZm9sbG93aW5nIHZhcmlhYmxlcyBoYXMgYXQgbGVhc3Qgb25lIG1pc3Npbmcgb2JzZXJ2YXRpb24/IA0KYGBge3J9DQoNCmBgYA0KDQrjgJAqKjEuMyDmsbrlrprmmK/lkKbopoHoo5znvLrpoIUqKuOAkVdoaWNoIG9mIHRoZSBmb2xsb3dpbmcgaXMgdGhlIGJlc3QgcmVhc29uIHRvIGZpbGwgaW4gdGhlIG1pc3NpbmcgdmFsdWVzIGZvciB0aGVzZSB2YXJpYWJsZXMgaW5zdGVhZCBvZiByZW1vdmluZyBvYnNlcnZhdGlvbnMgd2l0aCBtaXNzaW5nIGRhdGE/DQpgYGB7cn0NCg0KYGBgDQoNCuOAkCoqMS40IOijnOe8uumgheW3peWFtyoq44CRV2hhdCBiZXN0IGRlc2NyaWJlcyB0aGUgcHJvY2VzcyB3ZSBqdXN0IHVzZWQgdG8gaGFuZGxlIG1pc3NpbmcgdmFsdWVzPw0KYGBge3J9DQoNCmBgYA0KDQo8YnI+DQoNCi0gLSAtDQoNCiMjIyMgMiDlu7rnq4vmqKHlnosgUHJlZGljdGlvbiBNb2RlbHMNCg0K44CQKioyLjEg6aGv6JGX5oCnKirjgJFXaGljaCBpbmRlcGVuZGVudCB2YXJpYWJsZXMgYXJlIHNpZ25pZmljYW50IGluIG91ciBtb2RlbD8gDQpgYGB7cn0NCg0KYGBgDQoNCuOAkCoqMi4yIOW+nuWbnuatuOS/guaVuOS8sOioiOmCiumam+aViOeUqCoq44CRQ29uc2lkZXIgdHdvIGxvYW4gYXBwbGljYXRpb25zLCB3aGljaCBhcmUgaWRlbnRpY2FsIG90aGVyIHRoYW4gdGhlIGZhY3QgdGhhdCB0aGUgYm9ycm93ZXIgaW4gQXBwbGljYXRpb24gQSBoYXMgRklDTyBjcmVkaXQgc2NvcmUgNzAwIHdoaWxlIHRoZSBib3Jyb3dlciBpbiBBcHBsaWNhdGlvbiBCIGhhcyBGSUNPIGNyZWRpdCBzY29yZSA3MTAuIFdoYXQgaXMgdGhlIHZhbHVlIG9mIExvZ2l0KEEpIC0gTG9naXQoQik/IFdoYXQgaXMgdGhlIHZhbHVlIG9mIE8oQSkvTyhCKT8gDQoNCjxicj4NCjxoNCBhbGlnbj0iY2VudGVyIj7jgJBUaGUgRGlmZmVyZW5jZSBpbiBMb2dpdHPjgJE8L2g0Pg0KDQokJFxiZWdpbnthbGlnbn0NCkxvZ2l0KEEpLUxvZ2l0KEIpIA0KICAmPSBsb2coT19BKS1sb2coT19CKSBcXCANCiAgJj0gZihYX0EpLWYoWF9CKSBcXA0KICAmPSBiX3tGSUNPfShcdGV4dHtGSUNPfV9BIC0gXHRleHR7RklDT31fQikNClxlbmR7YWxpZ259JCQNCg0KPGJyPg0KPGg0IGFsaWduPSJjZW50ZXIiPuOAkFRoZSBSYXRpbyBvZiBPZGRz44CRPC9oND4NCg0KJCQgRXhwKGxvZyhPX0EpIC0gbG9nKE9fQikpID0gXGZyYWN7RXhwKGxvZyhPX0EpKX17RXhwKGxvZyhPX0IpKX0gPSBcZnJhY3tPX0F9e09fQn0kJA0KDQpgYGB7cn0NCiMgdGhlIGRpZmZlcmVuY2Ugb2YgbG9naXRzDQojIHRoZSByYXRpbyBvZiBvZGRzDQpgYGANCg0K44CQKioyLjMg5re35reG55+p6Zmj44CB5q2j56K6546HIHZzIOW6lee3muapn+eOhyoq44CRV2hhdCBpcyB0aGUgYWNjdXJhY3kgb2YgdGhlIGxvZ2lzdGljIHJlZ3Jlc3Npb24gbW9kZWw/IFdoYXQgaXMgdGhlIGFjY3VyYWN5IG9mIHRoZSBiYXNlbGluZSBtb2RlbD8gIA0KYGBge3J9DQojIHRlc3QgYWNjdXJhY3kNCiMgYmFzZWxpbmUgYWNjdXJhY3kNCmBgYA0KDQrjgJAqKjIuNCBST0MgJiBBVUMqKuOAkVVzZSB0aGUgUk9DUiBwYWNrYWdlIHRvIGNvbXB1dGUgdGhlIHRlc3Qgc2V0IEFVQy4gIA0KYGBge3J9DQojIHRlc3QgYWNjdXJhY3kNCiMgYmFzZWxpbmUgYWNjdXJhY3kNCmBgYA0KPGJyPg0KDQotIC0gLQ0KDQojIyMjIDMg5o+Q6auY5bqV57eaIFNtYXJ0IEJhc2VsaW5lDQoNCuOAkCoqMy4xIOmrmOW6lee3muaooeWeiyoq44CRVGhlIHZhcmlhYmxlIGludC5yYXRlIGlzIGhpZ2hseSBzaWduaWZpY2FudCBpbiB0aGUgYml2YXJpYXRlIG1vZGVsLCBidXQgaXQgaXMgbm90IHNpZ25pZmljYW50IGF0IHRoZSAwLjA1IGxldmVsIGluIHRoZSBtb2RlbCB0cmFpbmVkIHdpdGggYWxsIHRoZSBpbmRlcGVuZGVudCB2YXJpYWJsZXMuIFdoYXQgaXMgdGhlIG1vc3QgbGlrZWx5IGV4cGxhbmF0aW9uIGZvciB0aGlzIGRpZmZlcmVuY2U/DQpgYGB7cn0NCg0KYGBgDQoNCuOAkCoqMy4yIOmrmOW6lee3muaooeWei+eahOmgkOa4rOWAvCoq44CRV2hhdCBpcyB0aGUgaGlnaGVzdCBwcmVkaWN0ZWQgcHJvYmFiaWxpdHkgb2YgYSBsb2FuIG5vdCBiZWluZyBwYWlkIGluIGZ1bGwgb24gdGhlIHRlc3Rpbmcgc2V0PyBXaXRoIGEgbG9naXN0aWMgcmVncmVzc2lvbiBjdXRvZmYgb2YgMC41LCBob3cgbWFueSBsb2FucyB3b3VsZCBiZSBwcmVkaWN0ZWQgYXMgbm90IGJlaW5nIHBhaWQgaW4gZnVsbCBvbiB0aGUgdGVzdGluZyBzZXQ/DQpgYGB7cn0NCg0KYGBgDQoNCuOAkCoqMy4zIOmrmOW6lee3muaooeWei+eahOi+qOitmOeOhyoq44CRV2hhdCBpcyB0aGUgdGVzdCBzZXQgQVVDIG9mIHRoZSBiaXZhcmlhdGUgbW9kZWw/DQpgYGB7cn0NCg0KYGBgDQo8YnI+DQoNCi0gLSAtDQoNCiMjIyMgNCDpoJDkvLDmipXos4fnjbLliKkgQ29tcHV0aW5nIHRoZSBQcm9maXRhYmlsaXR5IG9mIGFuIEludmVzdG1lbnQNCg0K44CQKio0LjEg5oqV6LOH5YO55YC855qE566X5rOVKirjgJFIb3cgbXVjaCBkb2VzIGEgJDEwIGludmVzdG1lbnQgd2l0aCBhbiBhbm51YWwgaW50ZXJlc3QgcmF0ZSBvZiA2JSBwYXkgYmFjayBhZnRlciAzIHllYXJzLCB1c2luZyBjb250aW51b3VzIGNvbXBvdW5kaW5nIG9mIGludGVyZXN0Pw0KYGBge3J9DQoNCmBgYA0KDQrjgJAqKjQuMiDmipXos4fnjbLliKnnmoTnrpfms5XvvIzlkIjntITlrozmiJAqKuOAkVdoaWxlIHRoZSBpbnZlc3RtZW50IGhhcyB2YWx1ZSBjICogZXhwKHJ0KSBkb2xsYXJzIGFmdGVyIGNvbGxlY3RpbmcgaW50ZXJlc3QsIHRoZSBpbnZlc3RvciBoYWQgdG8gcGF5ICRjIGZvciB0aGUgaW52ZXN0bWVudC4gV2hhdCBpcyB0aGUgcHJvZml0IHRvIHRoZSBpbnZlc3RvciBpZiB0aGUgaW52ZXN0bWVudCBpcyBwYWlkIGJhY2sgaW4gZnVsbD8NCmBgYHtyfQ0KDQpgYGANCg0K44CQKio0LjMg5oqV6LOH542y5Yip55qE566X5rOV77yM6YGV57SEKirjgJFOb3csIGNvbnNpZGVyIHRoZSBjYXNlIHdoZXJlIHRoZSBpbnZlc3RvciBtYWRlIGEgJGMgaW52ZXN0bWVudCwgYnV0IGl0IHdhcyBub3QgcGFpZCBiYWNrIGluIGZ1bGwuIEFzc3VtZSwgY29uc2VydmF0aXZlbHksIHRoYXQgbm8gbW9uZXkgd2FzIHJlY2VpdmVkIGZyb20gdGhlIGJvcnJvd2VyIChvZnRlbiBhIGxlbmRlciB3aWxsIHJlY2VpdmUgc29tZSBidXQgbm90IGFsbCBvZiB0aGUgdmFsdWUgb2YgdGhlIGxvYW4sIG1ha2luZyB0aGlzIGEgcGVzc2ltaXN0aWMgYXNzdW1wdGlvbiBvZiBob3cgbXVjaCBpcyByZWNlaXZlZCkuIFdoYXQgaXMgdGhlIHByb2ZpdCB0byB0aGUgaW52ZXN0b3IgaW4gdGhpcyBzY2VuYXJpbz8NCmBgYHtyfQ0KIyBjICogZXhwKHJ0KSAtIGMgY29ycmVjdA0KYGBgDQo8YnI+DQoNCi0gLSAtDQoNCiMjIyMgNSDnsKHllq7mipXos4fnrZbnlaUgQSBTaW1wbGUgSW52ZXN0bWVudCBTdHJhdGVneQ0KDQrjgJAqKjUuMSDoqIjnrpfmuKzoqabos4fmlpnnmoTlr6bpmpvmipXloLHnjocqKuOAkVdoYXQgaXMgdGhlIG1heGltdW0gcHJvZml0IG9mIGEgJDEwIGludmVzdG1lbnQgaW4gYW55IGxvYW4gaW4gdGhlIHRlc3Rpbmcgc2V0Pw0KYGBge3J9DQoNCmBgYA0KPGJyPg0KDQotIC0gLQ0KDQojIyMjIDYg6Z2i5bCN5LiN56K65a6a5oCn55qE5oqV6LOH562W55WlIEFuIEludmVzdG1lbnQgU3RyYXRlZ3kgQmFzZWQgb24gUmlzaw0KDQpBIHNpbXBsZSBpbnZlc3RtZW50IHN0cmF0ZWd5IG9mIGVxdWFsbHkgaW52ZXN0aW5nIGluIGFsbCB0aGUgbG9hbnMgd291bGQgeWllbGQgcHJvZml0ICQyMC45NCBmb3IgYSAkMTAwIGludmVzdG1lbnQuIEJ1dCB0aGlzIHNpbXBsZSBpbnZlc3RtZW50IHN0cmF0ZWd5IGRvZXMgbm90IGxldmVyYWdlIHRoZSBwcmVkaWN0aW9uIG1vZGVsIHdlIGJ1aWx0IGVhcmxpZXIgaW4gdGhpcyBwcm9ibGVtLiANCg0K44CQKio2LjEg6auY5Yip546H44CB6auY6aKo6ZqqKirjgJFXaGF0IGlzIHRoZSBhdmVyYWdlIHByb2ZpdCBvZiBhICQxIGludmVzdG1lbnQgaW4gb25lIG9mIHRoZXNlIGhpZ2gtaW50ZXJlc3QgbG9hbnMgKGRvIG5vdCBpbmNsdWRlIHRoZSAkIHNpZ24gaW4geW91ciBhbnN3ZXIpPyBXaGF0IHByb3BvcnRpb24gb2YgdGhlIGhpZ2gtaW50ZXJlc3QgbG9hbnMgd2VyZSBub3QgcGFpZCBiYWNrIGluIGZ1bGw/DQpgYGB7cn0NCg0KYGBgDQoNCuOAkCoqNi4yIOmrmOWIqeeOh+S5i+S4reeahOS9jumiqOmaqioq44CRV2hhdCBpcyB0aGUgcHJvZml0IG9mIHRoZSBpbnZlc3Rvciwgd2hvIGludmVzdGVkICQxIGluIGVhY2ggb2YgdGhlc2UgMTAwIGxvYW5zPyBIb3cgbWFueSBvZiAxMDAgc2VsZWN0ZWQgbG9hbnMgd2VyZSBub3QgcGFpZCBiYWNrIGluIGZ1bGw/DQpgYGB7cn0NCg0KYGBgDQo8YnI+DQoNCi0gLSAtDQoNCuOAkCoqUSoq44CR5Yip55So5oiR5YCR5bu65aW955qE5qih5Z6L77yM5L2g5Y+v5Lul6Kit6KiI5Ye65q+U5LiK6L+w55qE5pa55rOV542y5Yip5pu06auY55qE5oqV6LOH5pa55rOV5ZeO77yf6KuL6Kmz6L+w5L2g55qE5L2c5rOV77yfDQpgYGB7cn0NCiMNCiMNCiMNCiMNCmBgYA0KPGJyPg0KDQotIC0gLQ0KDQo8YnI+PGJyPjxicj4NCg==