Introduction

Kiva is a non-profit organization that allows people to lend money via the Internet to low-income/underserved entrepreneurs and students in 82 countries. Kiva’s mission is to connect people through lending to alleviate poverty.

Kiva itself does not collect any interest on the loans it facilitates and Kiva lenders do not earn interest on loans. Kiva is purely supported by grants, loans, and donations from its users, corporations, and national institutions.

Kiva API is a platform that provides access to Kiva public data such as loan data, and lender data. The analysis in this report primarily makes use of the loan data.

WDI (World Development Indicators) presents the most current and accurate global development data available, and includes national, regional and global estimates.

Objective

One objective of this analysis is to help kiva and its funders better alleviate poverty by * showing which countries, areas and sectors loans mainly come from in terms of loan amount; * displaying loan amount distribution and loan status charateristics by gender;

Another objective of this report is to provide information about defaulted loan situations by sector in order for kiva and its funders to prepare possible solutions to reduce or avoid those risks.

Dataset Description

Variables of Interest:

The following list is a subset of variables that we are interested in.

## [1] "status"                 "sector"                
## [3] "country_code"           "country"               
## [5] "loan_amount"            "gender"                
## [7] "posted_year"            "GDP_annual_growth_rate"
## [9] " GDP_per_capita"

Let’s take a closer look at each variable. For simplicity, the variables will be defined in the same order they appear in the data frame.

Status

The status of a loan is an especially important concept to understand when working with loans. Below is a list of loan statuses and what each means:

The following list shows levels of status.

## [1] "N/A"          "defaulted"    "expired"      "funded"      
## [5] "fundraising"  "in_repayment" "paid"         "refunded"
  • defaulted

Occasionally, a borrower or a field partner may fail to make payments on a loan, either to the field partner or to Kiva, respectively. Usually when this happens, a loan simply becomes delinquent and remains in the in_repayment status. When a loan remains delinquent 6 months after the end of the loan payment schedule, the loan becomes defaulted. Defaulted loans are often never paid back and are a financial loss to the lenders of that loan. Most loans only default in part, but it is possible for the entire amount of the loan to not be repaid.

  • expired

On Kiva, if a loan doesn’t fully fund within 30 days of being listed on the website, it “expires.” This means three things:

  1. The loan profile remains on the site but shows up as “expired.”
  2. All lenders who have chipped in to fund the loan already are refunded.
  3. The Field Partner administering the loan doesn’t receive any of the funds.
  • funded

This status indicates that the loan request has been completely funded and is not available for new loans by lenders. The loan may be waiting for disbursal to the borrower(s), assuming that the loan was posted to Kiva before the field partner disbursed the funds. The field funded_date shows the exact time at which the loan was fully funded by Kiva.

  • fundraising

This status indicates that the loan has not yet been funded. This typically represents the type of loan that is advertised on the front page of Kiva.org and on the Lend tab. Lenders can only lend to loans that are in the fundraising status. The variable funded_amount contains the amount that has been funded so far.

  • in_repayment

When a loan is in repayment, the loan has been disbursed to the borrower(s) and they are in the process of using the funds and making payments on the loan to the field partner. Loans in this status may see journal updates and lenders of this loan will receive repayments when the borrower’s payments are reconciled by Kiva.

  • paid

Paid indicates that the loan has been paid back in full by the borrower. The payments have been distributed back to the lenders and the loan is closed to most new activity.

  • refunded

Although rare, Kiva may sometimes need to refund the funded portion of the loan to the lenders after the loan has been partially funded, fully funded, or even during repayment. There are many reasons why a loan could be refunded, but usually it is because there is an error with the loan posting or the loan itself has been found to violate Kiva’s policies. Currently, it is not possible to search for loans with this status.

Some of the terms used in the Kiva API for loan status do not always align with the terminology used on the website. Although the terms used in the API are clear and acceptable for communicating to lenders, the terminology on the site may be used as well.

See Kiva Loans Documentation for additional details.

Sector

  • Description: This character variable represents the sector name for the requested loan.

The following list shows sector names.

##  [1] "Agriculture"    "Arts"           "Clothing"       "Construction"  
##  [5] "Education"      "Entertainment"  "Food"           "Health"        
##  [9] "Housing"        "Manufacturing"  "Personal Use"   "Retail"        
## [13] "Services"       "Transportation" "Wholesale"

Country Code

  • Description: This character variable contains the country code or country name abbreviation.

The following list shows country code.

##  [1] "AF" "AL" "AM" "AZ" "BA" "BF" "BG" "BI" "BJ" "BO" "BR" "BW" "BZ" "CL"
## [15] "CM" "CN" "CO" "CR" "DO" "EC" "GE" "GH" "GT" "HN" "HT" "ID" "IL" "IN"
## [29] "IQ" "JO" "KE" "KH" "LA" "LB" "LK" "LR" "LS" "MD" "MG" "ML" "MN" "MR"
## [43] "MW" "MX" "MZ" "NA" "NG" "NI" "NP" "PA" "PE" "PG" "PH" "PK" "PY" "RW"
## [57] "SB" "SL" "SN" "SO" "SR" "SV" "TD" "TG" "TH" "TJ" "TL" "TR" "TZ" "UA"
## [71] "UG" "US" "VN" "VU" "WS" "XK" "ZA" "ZM" "ZW"

This list correlates directly to the Country variable.

Country

  • Description: This character variable contains the country name of the borrower.

The following list shows first five country names in the country list.

## Source: local data frame [5 x 1]
## 
##      country
##        (chr)
## 1 Azerbaijan
## 2    Vietnam
## 3    Vietnam
## 4     Mexico
## 5    Ecuador

Loan Amount

  • Description: This integer variable represents the amount of money distributed to the borrower in the lender’s currency.

Simple statistics for the variable:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    25.0   325.0   550.0   746.2   950.0 10000.0

Gender

  • Description: This is a character variable that represents the gender of the borrower.

Simple statistics for the variable:

## Source: local data frame [2 x 4]
## 
##   gender  count     mean   med
##    (chr)  (int)    (dbl) (dbl)
## 1      F 707135 807.3935   500
## 2      M 242359 942.9649   700

Posted Year

  • Description: The year loan is posted in kiva.

GDP Annual Growth Rate

  • Description: Annual percentage growth rate of GDP at market price based on constant local currency. Aggregates are based on constant 2005 U.S. dollars.

Simple statistics for the variable:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -14.800   3.307   5.840   5.314   7.632  34.500

GDP Per Capita

  • Description: GDP per capita is gross domestic product divided by mid-year population in constant 2005 U.S. dollars.

Simple statistics for the variable:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   147.1   588.3  1273.0  1858.0  2816.0 44320.0

Load Dataset and Libraries

Dataset Preparation

For access to the script used to clean this data, use the following link: DropBox Link

Below is an overview of the steps taken in order to prepare the data.

  • Loaded libraries:
    • dplyr
    • magrittr
    • RJSONIO
    • rlist
    • lubridate
    • WDI
    • ggplot2
  • Select variables of interest. See Variables of Interest.

  • Load the three raw data sets from Kiva.org and limit the data to only the variables of interest.

  • Bind data sets into one, called kiva.wdi.df.

  • Save the resulting data frame to a file. This is the data frame that will be used for the analysis and research contained in this report.

Analysis

Loan Amount and Country

The following code creates a world map to illustrate the number of loans that come from each country.

library(choroplethr)
library(choroplethrMaps)
# Sum total loans per country on map.
country.df <- select(kiva.wdi.df, country_code, loan_amount) #Handpick variables
data(country.regions) #Import country naming convention required
sumloan.df <- aggregate(country.df$loan_amount, by=list(country.df$country_code), FUN=sum) #Group by country and sum loan amount
colnames(sumloan.df)[1] <- "iso2c" #Rename country code to facilitate merge
colnames(sumloan.df)[2] <- "value" #Rename sum to value (required by choropleth)
goodData.df <- merge(sumloan.df, country.regions) #Merge dataframes
goodData.df <- goodData.df[,-1] #Remove country code
gg <- country_choropleth(goodData.df,
                   title = "Kiva Loan Totals",
                   legend = "$",
                   num_colors=1) #Build plot
gg + scale_fill_continuous(low="#eff3ff", high="#084594", na.value="grey")

Findings

  • The darker shades of blue indicate a higher number of loans. As can be seen, the majority of loans come from South America, Africa, and Southeast Asia.

Loan Amount and Gender

The following code creates a violin plot to show the distribution of loan amounts by gender.

#Choose data from column gender and loan_amount in loan.df to loan_gender data frame
loans.df %>% select(gender, loan_amount) -> loan_gender  
loan_gender <- na.omit(loan_gender) #Remove N/A

#Create violin plot
loan_gender2 <- ggplot(loan_gender, aes(x=gender, y=loan_amount))
loan_gender2 + geom_violin() + scale_y_continuous(limits=c(0,3000))

Findings

  • The plot for females has a more triangular shape that is much wider near the bottom. On the other hand, then plot for males has a more even distribution. This indicates that women tend to take out smaller loans compared to men.

Loan Amount&Status and Gender

The following code creates a mosaic plot to show the volume of loans by gender and status.

#Pick data from gender and status columns and change status names from lowercase to uppercase
loans.df %>% select(gender, status) %>%
  mutate(paid = ifelse(status=='paid', 'Paid',
                       ifelse(status=='defaulted' | status=='expired','Defaulted','Other'))) %>%
  na.omit -> stts #Remove N/A
Palette.r <- c("#DAA520","#4169E1","#8B008B") #Put colors to modules
#Create mosaic plot
mosaicplot( gender ~ paid, data = stts, color = Palette.r,
            main="",xlab="Gender",ylab="Status", las=1)

Findings

  • The primary statuses are paid, and defaulted, and the rest of the statuses are grouped into other. As can be seen, compared to the total, men appear to default on loans more often than women.

Loan Amount and Sector

In Countries with GDP Annual Growth Rates in top 3 ranges in terms of probability, Loan Amount by Sector

The following code creates a density plot to illustrate GDP annual growth rate distribution in terms of probability.

#Limit GDP annual growth rate to a range between 0% and 15%
kiva.wdi.df_d1 <- kiva.wdi.df[kiva.wdi.df$GDP_annual_growth_rate > 0
                           & kiva.wdi.df$GDP_annual_growth_rate < 15,]

#Line chart: Density plot for distribution of GDP annual growth rate 
kiva.wdi.df_d1 %>%
  ggplot(aes(x=GDP_annual_growth_rate)) + 
  geom_line(stat='density',
            color='red')+
  scale_x_discrete(breaks = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))+
  xlab("GDP Annual Growth Rate")+
  ylab("Probability")

Findings

  • In terms of probability, the top three ranges that GDP annual growth rates center around are between 3% and 4%, between 6% and 7%, between 7.2% and 8% respectively.

Thus, we will focus on countries which have GDP annual growth rates in these three ranges and analyze the distributions of total loan amounts in different sectors in those countries. The reason is that those countries are representative as for economic development.

The following code chooses countries with GDP annual growth rate in the ranges mentioned above and creates a histogram to illustrate loan amount distributed by sector in those countries.

#Choose countries with GDP annual growth rate in the ranges with top 3 high probability
kiva.wdi.df[(kiva.wdi.df$GDP_annual_growth_rate < 7 &  
              kiva.wdi.df$GDP_annual_growth_rate > 6) |
              (kiva.wdi.df$GDP_annual_growth_rate > 3 &  
              kiva.wdi.df$GDP_annual_growth_rate < 4) |
              (kiva.wdi.df$GDP_annual_growth_rate > 7.2 &  
              kiva.wdi.df$GDP_annual_growth_rate < 8)
            ,]->kiva.wdi.df1 


#Create a histogram to show loan amount distribution by sector in those countries
kiva.wdi.df1[!is.na(kiva.wdi.df1$sector),]%>%
  ggplot(aes(x=sector, y=loan_amount)) +
  geom_histogram(stat="identity") +
  xlab("Sector")+ 
  ylab("Total Loan Amount")+
  coord_flip()

Findings

  • In those countries, the top three sectors are food, retail and agriculture respectively in terms of total loan amount.

Looking at the data another way, in a boxplot created by the following code.

#Create a boxplot to show max, min, median etc. of total loan amount in different sectors 
kiva.wdi.df1[!is.na(kiva.wdi.df1$sector),]%>%
  ggplot(aes(x=sector,y=loan_amount))+
  geom_boxplot(outlier.size = 0)+ 
  scale_y_continuous(limits=c(0,2500))+
  coord_flip()+
  xlab ("Sector") +
  ylab ("Loan Amount")

Findings

  • Out of all sectors, wholesale has the highest loan amount median, which is about 600 USD.
  • Loan amount medians of each sector vary between approximately 400 USD and 600 USD.

In Countries with GDP per capita in top 3 ranges in terms of probability, Loan Amount by Sector

The following code creates a density plot to illustrate GDP per captita distribution in terms of probability.

#Limit GDP per capita to a range from 0 to 10000 in USD
kiva.wdi.df_d2 <- kiva.wdi.df[kiva.wdi.df$GDP_per_capita > 0
                           & kiva.wdi.df$GDP_per_capita< 10000,]

#Line chart: Density plot for distribution of GDP per capita 
kiva.wdi.df_d2 %>%
  ggplot(aes(x=GDP_per_capita)) + 
  geom_line(stat='density',
            color='red')+
  scale_x_discrete(breaks = c(0,500, 1000, 1500, 2500, 3500, 5000, 7500, 10000))

Findings

  • The top three ranges of GDP per capita are between 500 and 700, between 1000 and 1500, and between 2900 and 3500 in USD.

Therefore, we will focus on countries which have GDP per capita in these three ranges and analyze distribution of total loan amount by different sectors in those countries. The reason is that those countries are representative when it comes to economic development shared by people.

The following code picks countries with GDP per capita in the ranges mentioned above and creates a histogram to illustrate loan amount distribution by sector in those countries.

#Choose countries with GDP per capita in the ranges with top 3 high probability
kiva.wdi.df[(kiva.wdi.df$GDP_per_capita > 500 &
             kiva.wdi.df$GDP_per_capita < 700) |
            (kiva.wdi.df$GDP_per_capita > 1000 &
             kiva.wdi.df$GDP_per_capita < 1500)|
            (kiva.wdi.df$GDP_per_capita > 2900 &
             kiva.wdi.df$GDP_per_capita < 3500)
            ,]->kiva.wdi.df2


#Create a histogram to show loan amount distribution by sector in those countries
kiva.wdi.df2%>%
  ggplot(aes(x=sector, y=loan_amount)) +
  geom_histogram(stat="identity") +
  xlab("Sector")+ 
  ylab("Total Loan Amount")+
  coord_flip()

Findings

  • In those countries, top 3 sectors are still Food, Retail and Agriculture respectively in terms of total loan amount.

Looking at the data another way, in a boxplot created by the following code.

#Create a boxplot to show max, min, median etc. of total loan amount in different sectors 
kiva.wdi.df2[!is.na(kiva.wdi.df2$sector),]%>%
  ggplot(aes(x=sector,y=loan_amount))+
  geom_boxplot(outlier.size = 0)+ 
  scale_y_continuous(limits=c(0,2500))+
  coord_flip()+
  xlab ("Sector") +
  ylab ("Loan Amount")

Findings

  • Among all sectors, personal use has the highest loan amount median, which is about 600 USD.
  • Loan amount medians of each sectors range between 400 USD to 600 USD.

Summary

  • The loan amount by sector analysis above shows that in targeted countries, food, retail and agriculture sectors are more likely to require fund support. Lenders could concentrate on loaning money in these three sectors in order to achieve poverty alleviation and improve the economy.

Defaulted Loan Amount and Percentage by Sector

The following code creates a bar chart to illustrate the numbers of defaulted loans by different sectors.

#Remove N/A in delinquent column 
kiva.wdi.df[!is.na(kiva.wdi.df$delinquent),]->kiva.wdi.df3

#Histogram: Number of defaluted loan by sector
kiva.wdi.df3%>%
  group_by(sector)%>%
  summarize(count = n())%>% 
  ggplot(aes(x=sector, y=count)) +
  geom_histogram(stat="identity")+
  xlab("Sector")+ 
  ylab("Number of defaulted loans")+
  coord_flip()

Findings

  • In terms of the number of defaulted loans, the top three sectors are food, retail and agriculture. The numbers of defaulted loans were around 3000, 2300 and 2000 respectively.

However, in order to determine which loans are the highest risk for default, it is necessary to look at the defaulted loans as a percentage of the total as the following code do.

#Group the number of defaulted loans by sector
kiva.wdi.df3%>%
  group_by(sector)%>%
  summarize(Number_of_Defaulted_Loans = n())%>%
  {.}->a.df

#Group total number of loans by sector  
kiva.wdi.df%>%
  group_by(sector)%>%
  summarize(Total_Number_of_Loans = n())%>%
  {.}->b.df

b.df[!is.na(b.df$sector),]->b.df #Remove N/A

#Combine two data frames above to a new one
c.df<-data.frame(a.df,b.df$Total_Number_of_Loans)
#Add one new column to data frame for percentage of defaulted loans
c.df<-data.frame(c.df, Percentage = 
                   c.df$Number_of_Defaulted_Loans/c.df$b.df.Total_Number_of_Loans)

#Histogram: Percentage of defaulted loans by sector
c.df%>%
  group_by(sector)%>%
  ggplot(aes(x=sector, y=Percentage)) +
  geom_histogram(stat="identity")+
  xlab("Sector")+ 
  ylab("Percentage of defaulted loans")+
  coord_flip()  

Findings

  • Compared to the previous graph(Number of defaulted loans by sector), the rankings are quite different. The top three sectors with the highest loan default rate are health (4.5%), entertainment (3.8%) and clothing (3.5%).
  • Lenders who issue loans in these higher risk sectors should be aware of the higher risks and may need to more carefully measure the borrowers’ repayment ability.

Conclusion

Our analysis helps provide insights that may influence current and potential investors to invest in more statistically impactful areas or avoid areas where chance of default is greatest.

Key takeaways include: