Programing for Data Science - Loan Kiva Dataset

Dedy Gusnadi Sianipar

4/8/2021

1. Explanation

1.1 brief

“hi welcome to my Rmd :) in this LBB i will use previous data which is loan_kiva.csv”

1.2 Data’s Point of View

Kiva.org is a non-profit online crowdfunding platform that allows individuals to borrow funds for business purposes. Its mission is to improve the welfare of marginalized citizens (especially low-income entrepreneurs and students) in several countries. Crowdfunding (crowdfunding) is the activity of funds from several individuals to finance new business ventures.

Note : Detail explanatory will be given ata the of content

2. Data Preparation

2. Input Data

loan_kiva = read.csv("loan_kiva.csv")

2.1 Data Inspection

head(loan_kiva)
##       id funded_amount loan_amount            activity         sector  country
## 1 653051           300         300 Fruits & Vegetables           Food Pakistan
## 2 653053           575         575            Rickshaw Transportation Pakistan
## 3 653068           150         150      Transportation Transportation    India
## 4 653063           200         200          Embroidery           Arts Pakistan
## 5 653084           400         400          Milk Sales           Food Pakistan
## 6 653067           200         200               Dairy    Agriculture    India
##         region currency partner_id         posted_time         funded_time
## 1       Lahore      PKR        247 2014-01-01 06:12:39 2014-01-02 10:06:32
## 2       Lahore      PKR        247 2014-01-01 06:51:08 2014-01-02 09:17:23
## 3    Maynaguri      INR        334 2014-01-01 09:58:07 2014-01-01 16:01:36
## 4       Lahore      PKR        247 2014-01-01 08:03:11 2014-01-01 13:00:00
## 5 Abdul Hakeem      PKR        245 2014-01-01 11:53:19 2014-01-01 19:18:51
## 6    Maynaguri      INR        334 2014-01-01 09:51:02 2014-01-01 17:18:09
##   term_in_months lender_count repayment_interval
## 1             12           12          irregular
## 2             11           14          irregular
## 3             43            6             bullet
## 4             11            8          irregular
## 5             14           16            monthly
## 6             43            8             bullet
tail(loan_kiva)
##             id funded_amount loan_amount                  activity      sector
## 323274 1002658           225         225                    Sewing    Services
## 323275 1002602          1500        1500 Personal Housing Expenses     Housing
## 323276 1002761          1500        1500                   Farming Agriculture
## 323277 1002668           725         725              Beauty Salon    Services
## 323278 1002832           550         550     Food Production/Sales        Food
## 323279 1002773           500         500             Grocery Store        Food
##            country                                      region currency
## 323274  Tajikistan                                 Tursun-zoda      TJS
## 323275    Cambodia                                Kampong Speu      KHR
## 323276    Cambodia Kampong Cham province, Ponhea Krek district      KHR
## 323277    Pakistan                                      Lahore      PKR
## 323278 El Salvador                                   La Unión      USD
## 323279       Kenya                                         Voi      KES
##        partner_id         posted_time         funded_time term_in_months
## 323274         63 2015-12-31 05:33:47 2015-12-31 19:47:21             14
## 323275        106 2015-12-31 01:57:56 2015-12-31 21:00:58             26
## 323276        204 2015-12-31 10:54:52 2016-01-03 03:48:36             13
## 323277        247 2015-12-31 06:18:49 2016-01-27 17:52:55             13
## 323278        199 2015-12-31 15:26:08 2016-01-05 00:28:49             20
## 323279        164 2015-12-31 11:01:43 2015-12-31 22:08:27             13
##        lender_count repayment_interval
## 323274            8            monthly
## 323275           56            monthly
## 323276           54            monthly
## 323277           28            monthly
## 323278           21            monthly
## 323279           12          irregular
names(loan_kiva)
##  [1] "id"                 "funded_amount"      "loan_amount"       
##  [4] "activity"           "sector"             "country"           
##  [7] "region"             "currency"           "partner_id"        
## [10] "posted_time"        "funded_time"        "term_in_months"    
## [13] "lender_count"       "repayment_interval"

From out inspection we can conclude : loan kiva data contain 323279 rows and 14 coloumns each of coloumn : id : Unique ID for (loan ID) funded_amount : the amount disbursed by Kiva to the agent (USD) loan_amount : Amount distributed by agents to borrowers (USD) activity : A more specific category than sectors sector : Category of loan country : The full country name, where the loan is disbursed region : The full region name of the country currency : Currency partner_id : ID for the partner organization posted_time : Loan time is posted on Kiva by an agent funded_time : The time the loan has been fully financed by the lender term_in_months : Duration of loan disbursement (in months) lender_count : The number of borrowers who contributed repayment_interval: How to pay off the loan

2.2 Data Cleansing & Coertions

check data type for each colomn

str(loan_kiva)
## 'data.frame':    323279 obs. of  14 variables:
##  $ id                : int  653051 653053 653068 653063 653084 653067 653078 653082 653048 653060 ...
##  $ funded_amount     : num  300 575 150 200 400 200 400 475 625 200 ...
##  $ loan_amount       : num  300 575 150 200 400 200 400 475 625 200 ...
##  $ activity          : chr  "Fruits & Vegetables" "Rickshaw" "Transportation" "Embroidery" ...
##  $ sector            : chr  "Food" "Transportation" "Transportation" "Arts" ...
##  $ country           : chr  "Pakistan" "Pakistan" "India" "Pakistan" ...
##  $ region            : chr  "Lahore" "Lahore" "Maynaguri" "Lahore" ...
##  $ currency          : chr  "PKR" "PKR" "INR" "PKR" ...
##  $ partner_id        : int  247 247 334 247 245 334 245 245 247 247 ...
##  $ posted_time       : chr  "2014-01-01 06:12:39" "2014-01-01 06:51:08" "2014-01-01 09:58:07" "2014-01-01 08:03:11" ...
##  $ funded_time       : chr  "2014-01-02 10:06:32" "2014-01-02 09:17:23" "2014-01-01 16:01:36" "2014-01-01 13:00:00" ...
##  $ term_in_months    : int  12 11 43 11 14 43 14 14 11 11 ...
##  $ lender_count      : int  12 14 6 8 16 8 8 19 24 3 ...
##  $ repayment_interval: chr  "irregular" "irregular" "bullet" "irregular" ...

for this result , we find some of data type not in the correct type, we need to convert it into corect type (data coertion)

loan_kiva[,c("activity","sector","country","region","currency","repayment_interval")] =lapply(loan_kiva[,c("activity","sector","country","region","currency","repayment_interval")],as.factor)

loan_kiva$posted_time <- as.Date(loan_kiva$posted_time,"%Y-%m-%d %H:%M:%S")
loan_kiva$funded_time <- as.Date(loan_kiva$funded_time,"%Y-%m-%d %H:%M:%S")


str(loan_kiva)
## 'data.frame':    323279 obs. of  14 variables:
##  $ id                : int  653051 653053 653068 653063 653084 653067 653078 653082 653048 653060 ...
##  $ funded_amount     : num  300 575 150 200 400 200 400 475 625 200 ...
##  $ loan_amount       : num  300 575 150 200 400 200 400 475 625 200 ...
##  $ activity          : Factor w/ 154 levels "Adult Care","Agriculture",..: 62 127 140 50 90 42 11 87 60 127 ...
##  $ sector            : Factor w/ 15 levels "Agriculture",..: 7 14 14 2 7 1 13 10 7 14 ...
##  $ country           : Factor w/ 82 levels "Afghanistan",..: 52 52 27 52 52 27 52 52 52 52 ...
##  $ region            : Factor w/ 9204 levels "","\"\"The first May\"\" village",..: 4377 4377 5172 4377 115 5172 2645 4377 4377 4377 ...
##  $ currency          : Factor w/ 66 levels "ALL","AMD","AZN",..: 44 44 22 44 44 22 44 44 44 44 ...
##  $ partner_id        : int  247 247 334 247 245 334 245 245 247 247 ...
##  $ posted_time       : Date, format: "2014-01-01" "2014-01-01" ...
##  $ funded_time       : Date, format: "2014-01-02" "2014-01-02" ...
##  $ term_in_months    : int  12 11 43 11 14 43 14 14 11 11 ...
##  $ lender_count      : int  12 14 6 8 16 8 8 19 24 3 ...
##  $ repayment_interval: Factor w/ 3 levels "bullet","irregular",..: 2 2 1 2 3 1 3 3 2 2 ...

Each of colomn already changed ito desired data type

cek for missing value

colSums(is.na(loan_kiva))
##                 id      funded_amount        loan_amount           activity 
##                  0                  0                  0                  0 
##             sector            country             region           currency 
##                  0                  0                  0                  0 
##         partner_id        posted_time        funded_time     term_in_months 
##                  0                  0                  0                  0 
##       lender_count repayment_interval 
##                  0                  0
anyNA(loan_kiva)
## [1] FALSE

Great! No missing value

3. Data Eplanation

Brief explantaion

summary(loan_kiva)
##        id          funded_amount       loan_amount      
##  Min.   : 653047   Min.   :    25.0   Min.   :    25.0  
##  1st Qu.: 737420   1st Qu.:   275.0   1st Qu.:   275.0  
##  Median : 827056   Median :   500.0   Median :   500.0  
##  Mean   : 826774   Mean   :   828.8   Mean   :   828.8  
##  3rd Qu.: 915291   3rd Qu.:  1000.0   3rd Qu.:  1000.0  
##  Max.   :1002884   Max.   :100000.0   Max.   :100000.0  
##                                                         
##                       activity              sector             country      
##  Farming                  : 33610   Agriculture:86509   Philippines: 81199  
##  General Store            : 31087   Food       :68752   Kenya      : 31947  
##  Personal Housing Expenses: 15616   Retail     :62118   El Salvador: 20543  
##  Agriculture              : 14309   Services   :20550   Cambodia   : 13402  
##  Food Production/Sales    : 13950   Housing    :16318   Peru       : 12799  
##  Retail                   : 13728   Clothing   :15840   Uganda     : 11832  
##  (Other)                  :200979   (Other)    :53192   (Other)    :151557  
##         region          currency        partner_id     posted_time        
##            : 26253   PHP    : 81199   Min.   :  9.0   Min.   :2014-01-01  
##  Kaduna    :  5466   USD    : 52751   1st Qu.:125.0   1st Qu.:2014-07-11  
##  Lahore    :  4322   KES    : 31467   Median :145.0   Median :2015-01-12  
##  Kisii     :  3324   PEN    : 12225   Mean   :166.7   Mean   :2015-01-08  
##  Cusco     :  3013   UGX    : 11772   3rd Qu.:199.0   3rd Qu.:2015-07-09  
##  Thanh Hoá:  2099   PKR    : 11647   Max.   :469.0   Max.   :2015-12-31  
##  (Other)   :278802   (Other):122218                                       
##   funded_time         term_in_months   lender_count     repayment_interval
##  Min.   :2014-01-01   Min.   :  2.0   Min.   :   1.00   bullet   : 32653  
##  1st Qu.:2014-07-24   1st Qu.:  8.0   1st Qu.:   8.00   irregular:130580  
##  Median :2015-01-24   Median : 13.0   Median :  15.00   monthly  :160046  
##  Mean   :2015-01-23   Mean   : 13.9   Mean   :  22.85                     
##  3rd Qu.:2015-07-24   3rd Qu.: 14.0   3rd Qu.:  28.00                     
##  Max.   :2016-02-25   Max.   :158.0   Max.   :2986.00                     
## 

Summary :

  1. First loan occured in january,1 2014 2 Farming was the most popular for Activity
  2. Agriculture was the most popular sector for loan
  3. Philipin was the most country loan amount
  4. Kiva Loan gained funded amount average at 828,8 USD ; with max funded amount at 100000 USD
  5. mean of lender count at 22.85 human ; with max lender count is 2896 human
  6. the mos popular of repayment interval is monthly
  7. mean of term in months is 13.9 with max term in month is 158