Kiva is a non-profit organization that allows people to lend money via the Internet to low-income/underserved entrepreneurs and students in 82 countries. Kiva’s mission is to connect people through lending to alleviate poverty.
Kiva itself does not collect any interest on the loans it facilitates and Kiva lenders do not earn interest on loans. Kiva is purely supported by grants, loans, and donations from its users, corporations, and national institutions.
Kiva API is a platform that provides access to Kiva public data such as loan data, and lender data. The analysis in this report primarily makes use of the loan data.
WDI (World Development Indicators) presents the most current and accurate global development data available, and includes national, regional and global estimates.
One objective of this analysis is to help kiva and its funders better alleviate poverty by * showing which countries, areas and sectors loans mainly come from in terms of loan amount; * displaying loan amount distribution and loan status charateristics by gender;
Another objective of this report is to provide information about defaulted loan situations by sector in order for kiva and its funders to prepare possible solutions to reduce or avoid those risks.
The following list is a subset of variables that we are interested in.
## [1] "status" "sector"
## [3] "country_code" "country"
## [5] "loan_amount" "gender"
## [7] "posted_year" "GDP_annual_growth_rate"
## [9] " GDP_per_capita"
Let’s take a closer look at each variable. For simplicity, the variables will be defined in the same order they appear in the data frame.
The status of a loan is an especially important concept to understand when working with loans. Below is a list of loan statuses and what each means:
The following list shows levels of status.
## [1] "N/A" "defaulted" "expired" "funded"
## [5] "fundraising" "in_repayment" "paid" "refunded"
Occasionally, a borrower or a field partner may fail to make payments on a loan, either to the field partner or to Kiva, respectively. Usually when this happens, a loan simply becomes delinquent and remains in the in_repayment status. When a loan remains delinquent 6 months after the end of the loan payment schedule, the loan becomes defaulted. Defaulted loans are often never paid back and are a financial loss to the lenders of that loan. Most loans only default in part, but it is possible for the entire amount of the loan to not be repaid.
On Kiva, if a loan doesn’t fully fund within 30 days of being listed on the website, it “expires.” This means three things:
This status indicates that the loan request has been completely funded and is not available for new loans by lenders. The loan may be waiting for disbursal to the borrower(s), assuming that the loan was posted to Kiva before the field partner disbursed the funds. The field funded_date shows the exact time at which the loan was fully funded by Kiva.
This status indicates that the loan has not yet been funded. This typically represents the type of loan that is advertised on the front page of Kiva.org and on the Lend tab. Lenders can only lend to loans that are in the fundraising status. The variable funded_amount contains the amount that has been funded so far.
When a loan is in repayment, the loan has been disbursed to the borrower(s) and they are in the process of using the funds and making payments on the loan to the field partner. Loans in this status may see journal updates and lenders of this loan will receive repayments when the borrower’s payments are reconciled by Kiva.
Paid indicates that the loan has been paid back in full by the borrower. The payments have been distributed back to the lenders and the loan is closed to most new activity.
Although rare, Kiva may sometimes need to refund the funded portion of the loan to the lenders after the loan has been partially funded, fully funded, or even during repayment. There are many reasons why a loan could be refunded, but usually it is because there is an error with the loan posting or the loan itself has been found to violate Kiva’s policies. Currently, it is not possible to search for loans with this status.
Some of the terms used in the Kiva API for loan status do not always align with the terminology used on the website. Although the terms used in the API are clear and acceptable for communicating to lenders, the terminology on the site may be used as well.
See Kiva Loans Documentation for additional details.
The following list shows sector names.
## [1] "Agriculture" "Arts" "Clothing" "Construction"
## [5] "Education" "Entertainment" "Food" "Health"
## [9] "Housing" "Manufacturing" "Personal Use" "Retail"
## [13] "Services" "Transportation" "Wholesale"
The following list shows country code.
## [1] "AF" "AL" "AM" "AZ" "BA" "BF" "BG" "BI" "BJ" "BO" "BR" "BW" "BZ" "CL"
## [15] "CM" "CN" "CO" "CR" "DO" "EC" "GE" "GH" "GT" "HN" "HT" "ID" "IL" "IN"
## [29] "IQ" "JO" "KE" "KH" "LA" "LB" "LK" "LR" "LS" "MD" "MG" "ML" "MN" "MR"
## [43] "MW" "MX" "MZ" "NA" "NG" "NI" "NP" "PA" "PE" "PG" "PH" "PK" "PY" "RW"
## [57] "SB" "SL" "SN" "SO" "SR" "SV" "TD" "TG" "TH" "TJ" "TL" "TR" "TZ" "UA"
## [71] "UG" "US" "VN" "VU" "WS" "XK" "ZA" "ZM" "ZW"
This list correlates directly to the Country variable.
The following list shows first five country names in the country list.
## Source: local data frame [5 x 1]
##
## country
## (chr)
## 1 Azerbaijan
## 2 Vietnam
## 3 Vietnam
## 4 Mexico
## 5 Ecuador
Simple statistics for the variable:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 25.0 325.0 550.0 746.2 950.0 10000.0
Simple statistics for the variable:
## Source: local data frame [2 x 4]
##
## gender count mean med
## (chr) (int) (dbl) (dbl)
## 1 F 707135 807.3935 500
## 2 M 242359 942.9649 700
Simple statistics for the variable:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -14.800 3.307 5.840 5.314 7.632 34.500
Simple statistics for the variable:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 147.1 588.3 1273.0 1858.0 2816.0 44320.0
For access to the script used to clean this data, use the following link: DropBox Link
Below is an overview of the steps taken in order to prepare the data.
Select variables of interest. See Variables of Interest.
Load the three raw data sets from Kiva.org and limit the data to only the variables of interest.
Bind data sets into one, called kiva.wdi.df.
Save the resulting data frame to a file. This is the data frame that will be used for the analysis and research contained in this report.
The following code creates a world map to illustrate the number of loans that come from each country.
library(choroplethr)
library(choroplethrMaps)
# Sum total loans per country on map.
country.df <- select(kiva.wdi.df, country_code, loan_amount) #Handpick variables
data(country.regions) #Import country naming convention required
sumloan.df <- aggregate(country.df$loan_amount, by=list(country.df$country_code), FUN=sum) #Group by country and sum loan amount
colnames(sumloan.df)[1] <- "iso2c" #Rename country code to facilitate merge
colnames(sumloan.df)[2] <- "value" #Rename sum to value (required by choropleth)
goodData.df <- merge(sumloan.df, country.regions) #Merge dataframes
goodData.df <- goodData.df[,-1] #Remove country code
gg <- country_choropleth(goodData.df,
title = "Kiva Loan Totals",
legend = "$",
num_colors=1) #Build plot
gg + scale_fill_continuous(low="#eff3ff", high="#084594", na.value="grey")
Findings
The following code creates a violin plot to show the distribution of loan amounts by gender.
#Choose data from column gender and loan_amount in loan.df to loan_gender data frame
loans.df %>% select(gender, loan_amount) -> loan_gender
loan_gender <- na.omit(loan_gender) #Remove N/A
#Create violin plot
loan_gender2 <- ggplot(loan_gender, aes(x=gender, y=loan_amount))
loan_gender2 + geom_violin() + scale_y_continuous(limits=c(0,3000))
Findings
The following code creates a mosaic plot to show the volume of loans by gender and status.
#Pick data from gender and status columns and change status names from lowercase to uppercase
loans.df %>% select(gender, status) %>%
mutate(paid = ifelse(status=='paid', 'Paid',
ifelse(status=='defaulted' | status=='expired','Defaulted','Other'))) %>%
na.omit -> stts #Remove N/A
Palette.r <- c("#DAA520","#4169E1","#8B008B") #Put colors to modules
#Create mosaic plot
mosaicplot( gender ~ paid, data = stts, color = Palette.r,
main="",xlab="Gender",ylab="Status", las=1)
Findings
The following code creates a density plot to illustrate GDP annual growth rate distribution in terms of probability.
#Limit GDP annual growth rate to a range between 0% and 15%
kiva.wdi.df_d1 <- kiva.wdi.df[kiva.wdi.df$GDP_annual_growth_rate > 0
& kiva.wdi.df$GDP_annual_growth_rate < 15,]
#Line chart: Density plot for distribution of GDP annual growth rate
kiva.wdi.df_d1 %>%
ggplot(aes(x=GDP_annual_growth_rate)) +
geom_line(stat='density',
color='red')+
scale_x_discrete(breaks = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))+
xlab("GDP Annual Growth Rate")+
ylab("Probability")
Findings
Thus, we will focus on countries which have GDP annual growth rates in these three ranges and analyze the distributions of total loan amounts in different sectors in those countries. The reason is that those countries are representative as for economic development.
The following code chooses countries with GDP annual growth rate in the ranges mentioned above and creates a histogram to illustrate loan amount distributed by sector in those countries.
#Choose countries with GDP annual growth rate in the ranges with top 3 high probability
kiva.wdi.df[(kiva.wdi.df$GDP_annual_growth_rate < 7 &
kiva.wdi.df$GDP_annual_growth_rate > 6) |
(kiva.wdi.df$GDP_annual_growth_rate > 3 &
kiva.wdi.df$GDP_annual_growth_rate < 4) |
(kiva.wdi.df$GDP_annual_growth_rate > 7.2 &
kiva.wdi.df$GDP_annual_growth_rate < 8)
,]->kiva.wdi.df1
#Create a histogram to show loan amount distribution by sector in those countries
kiva.wdi.df1[!is.na(kiva.wdi.df1$sector),]%>%
ggplot(aes(x=sector, y=loan_amount)) +
geom_histogram(stat="identity") +
xlab("Sector")+
ylab("Total Loan Amount")+
coord_flip()
Findings
Looking at the data another way, in a boxplot created by the following code.
#Create a boxplot to show max, min, median etc. of total loan amount in different sectors
kiva.wdi.df1[!is.na(kiva.wdi.df1$sector),]%>%
ggplot(aes(x=sector,y=loan_amount))+
geom_boxplot(outlier.size = 0)+
scale_y_continuous(limits=c(0,2500))+
coord_flip()+
xlab ("Sector") +
ylab ("Loan Amount")
Findings
The following code creates a density plot to illustrate GDP per captita distribution in terms of probability.
#Limit GDP per capita to a range from 0 to 10000 in USD
kiva.wdi.df_d2 <- kiva.wdi.df[kiva.wdi.df$GDP_per_capita > 0
& kiva.wdi.df$GDP_per_capita< 10000,]
#Line chart: Density plot for distribution of GDP per capita
kiva.wdi.df_d2 %>%
ggplot(aes(x=GDP_per_capita)) +
geom_line(stat='density',
color='red')+
scale_x_discrete(breaks = c(0,500, 1000, 1500, 2500, 3500, 5000, 7500, 10000))
Findings
Therefore, we will focus on countries which have GDP per capita in these three ranges and analyze distribution of total loan amount by different sectors in those countries. The reason is that those countries are representative when it comes to economic development shared by people.
The following code picks countries with GDP per capita in the ranges mentioned above and creates a histogram to illustrate loan amount distribution by sector in those countries.
#Choose countries with GDP per capita in the ranges with top 3 high probability
kiva.wdi.df[(kiva.wdi.df$GDP_per_capita > 500 &
kiva.wdi.df$GDP_per_capita < 700) |
(kiva.wdi.df$GDP_per_capita > 1000 &
kiva.wdi.df$GDP_per_capita < 1500)|
(kiva.wdi.df$GDP_per_capita > 2900 &
kiva.wdi.df$GDP_per_capita < 3500)
,]->kiva.wdi.df2
#Create a histogram to show loan amount distribution by sector in those countries
kiva.wdi.df2%>%
ggplot(aes(x=sector, y=loan_amount)) +
geom_histogram(stat="identity") +
xlab("Sector")+
ylab("Total Loan Amount")+
coord_flip()
Findings
Looking at the data another way, in a boxplot created by the following code.
#Create a boxplot to show max, min, median etc. of total loan amount in different sectors
kiva.wdi.df2[!is.na(kiva.wdi.df2$sector),]%>%
ggplot(aes(x=sector,y=loan_amount))+
geom_boxplot(outlier.size = 0)+
scale_y_continuous(limits=c(0,2500))+
coord_flip()+
xlab ("Sector") +
ylab ("Loan Amount")
Findings
Summary
The following code creates a bar chart to illustrate the numbers of defaulted loans by different sectors.
#Remove N/A in delinquent column
kiva.wdi.df[!is.na(kiva.wdi.df$delinquent),]->kiva.wdi.df3
#Histogram: Number of defaluted loan by sector
kiva.wdi.df3%>%
group_by(sector)%>%
summarize(count = n())%>%
ggplot(aes(x=sector, y=count)) +
geom_histogram(stat="identity")+
xlab("Sector")+
ylab("Number of defaulted loans")+
coord_flip()
Findings
However, in order to determine which loans are the highest risk for default, it is necessary to look at the defaulted loans as a percentage of the total as the following code do.
#Group the number of defaulted loans by sector
kiva.wdi.df3%>%
group_by(sector)%>%
summarize(Number_of_Defaulted_Loans = n())%>%
{.}->a.df
#Group total number of loans by sector
kiva.wdi.df%>%
group_by(sector)%>%
summarize(Total_Number_of_Loans = n())%>%
{.}->b.df
b.df[!is.na(b.df$sector),]->b.df #Remove N/A
#Combine two data frames above to a new one
c.df<-data.frame(a.df,b.df$Total_Number_of_Loans)
#Add one new column to data frame for percentage of defaulted loans
c.df<-data.frame(c.df, Percentage =
c.df$Number_of_Defaulted_Loans/c.df$b.df.Total_Number_of_Loans)
#Histogram: Percentage of defaulted loans by sector
c.df%>%
group_by(sector)%>%
ggplot(aes(x=sector, y=Percentage)) +
geom_histogram(stat="identity")+
xlab("Sector")+
ylab("Percentage of defaulted loans")+
coord_flip()
Findings
Our analysis helps provide insights that may influence current and potential investors to invest in more statistically impactful areas or avoid areas where chance of default is greatest.
Key takeaways include:
The majority of loans come from South America(e.g.Peru), Africa(e.g.Uganda), and Southeast Asia(e.g. Combodia and Phillipine).Kiva could pay more attentions to loan regional needs. To provide more timely loans for people in those areas and countries will improve effectiveness of poverty alliviation generally.
Women tend to take out smaller loans compared to men.Men appear to default on loans more often than women. Thus, more small loans could be provided for women loaners.As for men loaners, Kiva could measure their repayment ability to reduce or avoid defaulted risks before loans.
In representative countries in terms of economic development, Food, Retail and Agriculture sectors are more likely to require fund support because total loan amounts in the three sectors top three in the sector list.Moreover, the popular loan amount vary from 400 to 600 in USD. If Kiva focus on these sectors, the impacts on poverty alliviation could be obvious based on sectors.
Lenders who issue loans in these higher risk sectors including health, entertainment and clothing should be aware of the higher risks and may need to more carefully measure the borrowers’ repayment ability.