1 Background

1.1 What’s Kiva .org ?

Based on Wikipedia, Kiva (commonly known by its domain name, Kiva.org) is a 501(c)(3) non-profit organization headquartered in San Francisco, California that allows people to lend money via the Internet to low-income entrepreneurs and students in 77 countries. Kiva’s mission is “to expand financial access to help underserved communities thrive.”

Since 2005, Kiva has crowd-funded more than 1.6 million loans, totaling over $1.33 billion, with a repayment rate of 95.8 percent. Over 1.8 million lenders worldwide use the Kiva platform. Lenders do not receive interest on the money they lend. Kiva includes personal stories of each person who applies for a loan so that lenders can connect with borrowers on a human level.

Kiva relies on a network of field partners to administer the loans on the ground. These field partners can be microfinance institutions, social impact businesses, schools or non-profit organizations. Borrowers pay interest on most loans to the field partners, and the field partners are charged small fees by Kiva. Kiva is supported by grants, loans, and donations from its users, corporations, and national institutions.

1.2 What will we do?

The kiva dataset contains loan data from several kiva customers in several countries. With this data we will do an exploratory data analysis (EDA) and a little visualization based on some of the business questions that have been made. before doing EDA, it would be better if we recognize our data first. The following is a list of column names along with their explanations that can make it easier for us to analyze the data.

Here are the list of column names and their explanation.

id: Unique ID for the loan (Loan ID)
funded_amount: The amount disbursed by Kiva to the field agent (USD)
loan_amount: The amount disbursed by the field agent to the borrower (USD)
activity: Activity type as a more specific category
sector: High-level category of the loan
country: Full country name of the country in which loan was disbursed
region: Full region name within the country
currency: The currency in which the loan was disbursed
partner_id: ID of the partner organization
posted_time: The time at which the loan is posted on Kiva by the field agent
funded_time: The time at which the loan posted to Kiva gets funded by lenders completely
term_in_months: The duration for which the loan was disbursed in months
lender_count: The total number of lenders that contributed to each loan
repayment_interval: Interval for the repayment of the loan

2 Input Data

The first thing we do is input the loan_kiva.csv data that we saved. using the read_csv () function. and will be saved with the name dataset so that the following dataframe is obtained

dataset <- read.csv("loan_kiva.csv")
head(dataset)

3 Data Wrangling

We will do checking data and transform data.

3.1 Checking Data

In this section we will check whether our data contains missing values or not, and check the data type in each column. This step is important because if the data still contains missing values or the data type is not suitable, it will make it difficult for us to perform operations on the dataframe that we have created.

colSums(is.na(dataset))

##                 id      funded_amount        loan_amount           activity 
##                  0                  0                  0                  0 
##             sector            country             region           currency 
##                  0                  0                  0                  0 
##         partner_id        posted_time        funded_time     term_in_months 
##                  0                  0                  0                  0 
##       lender_count repayment_interval 
##                  0                  0

It means that our dataset does not have any missing values.The addition, I want to check the summary and str from dataset so I know the statistic of each columns and I can guess is the data type correct or not.

Use summary() to check the summary

summary(dataset)

##        id          funded_amount      loan_amount        activity        
##  Min.   : 823031   Min.   :   25.0   Min.   :   25.0   Length:165040     
##  1st Qu.: 867063   1st Qu.:  275.0   1st Qu.:  275.0   Class :character  
##  Median : 913180   Median :  475.0   Median :  475.0   Mode  :character  
##  Mean   : 913167   Mean   :  806.9   Mean   :  806.9                     
##  3rd Qu.: 958990   3rd Qu.:  950.0   3rd Qu.:  950.0                     
##  Max.   :1002884   Max.   :50000.0   Max.   :50000.0                     
##     sector            country             region            currency        
##  Length:165040      Length:165040      Length:165040      Length:165040     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    partner_id    posted_time        funded_time        term_in_months  
##  Min.   :  9.0   Length:165040      Length:165040      Min.   :  2.00  
##  1st Qu.:126.0   Class :character   Class :character   1st Qu.:  8.00  
##  Median :145.0   Mode  :character   Mode  :character   Median : 12.00  
##  Mean   :172.9                                         Mean   : 13.43  
##  3rd Qu.:201.0                                         3rd Qu.: 14.00  
##  Max.   :469.0                                         Max.   :158.00  
##   lender_count     repayment_interval
##  Min.   :   1.00   Length:165040     
##  1st Qu.:   8.00   Class :character  
##  Median :  14.00   Mode  :character  
##  Mean   :  21.63                     
##  3rd Qu.:  26.00                     
##  Max.   :1605.00

Use str() to check the structure of dataframe

str(dataset)

## 'data.frame':    165040 obs. of  14 variables:
##  $ id                : int  823048 823138 823051 823143 823154 823117 823100 823132 823137 823050 ...
##  $ funded_amount     : int  450 200 175 675 1450 250 450 900 225 250 ...
##  $ loan_amount       : int  450 200 175 675 1450 250 450 900 225 250 ...
##  $ activity          : chr  "Butcher Shop" "Food Production/Sales" "Fruits & Vegetables" "Fishing" ...
##  $ sector            : chr  "Food" "Food" "Food" "Food" ...
##  $ country           : chr  "Philippines" "Philippines" "Philippines" "Philippines" ...
##  $ region            : chr  "Kabankalan, Negros Occidental" "Baybay, Leyte" "Kabankalan, Negros Occidental" "Carles, Iloilo" ...
##  $ currency          : chr  "PHP" "PHP" "PHP" "PHP" ...
##  $ partner_id        : int  145 125 145 125 202 145 245 125 125 145 ...
##  $ posted_time       : chr  "2015-01-01 02:38:53" "2015-01-01 15:57:25" "2015-01-01 02:49:11" "2015-01-01 16:20:43" ...
##  $ funded_time       : chr  "2015-01-01 16:20:50" "2015-01-05 00:21:31" "2015-01-01 06:12:18" "2015-01-05 19:21:50" ...
##  $ term_in_months    : int  8 10 8 20 11 8 14 12 12 5 ...
##  $ lender_count      : int  14 5 4 26 45 7 16 19 7 7 ...
##  $ repayment_interval: chr  "irregular" "irregular" "irregular" "irregular" ...

3.2 Transform Data

From the previous step, we know that the data not have missing value but there are several columns with the wrong data type. So in this part we do Transforming data by changing the wrong data type to correct ones.

changing type to factor using as.factor

dataset[,c("sector","country","region","currency","repayment_interval")] <-
  lapply(dataset[,c("sector","country","region","currency","repayment_interval")],
         as.factor)

Changing type to date using package lubridate Before that, we must call the lubridate with function library()

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

and now we can use the package

dataset$posted_time <- ymd_hms(dataset$posted_time)
dataset$funded_time <- ymd_hms(dataset$funded_time)

Let’s check data structure again to see that the data type have already changed.

str(dataset)

## 'data.frame':    165040 obs. of  14 variables:
##  $ id                : int  823048 823138 823051 823143 823154 823117 823100 823132 823137 823050 ...
##  $ funded_amount     : int  450 200 175 675 1450 250 450 900 225 250 ...
##  $ loan_amount       : int  450 200 175 675 1450 250 450 900 225 250 ...
##  $ activity          : chr  "Butcher Shop" "Food Production/Sales" "Fruits & Vegetables" "Fishing" ...
##  $ sector            : Factor w/ 15 levels "Agriculture",..: 7 7 7 7 1 7 12 1 7 7 ...
##  $ country           : Factor w/ 77 levels "Albania","Armenia",..: 53 53 53 53 29 53 48 53 53 53 ...
##  $ region            : Factor w/ 6374 levels "","\"\"The first May\"\" village",..: 2356 686 2356 1118 6233 5626 2257 686 686 2356 ...
##  $ currency          : Factor w/ 62 levels "ALL","AMD","AZN",..: 40 40 40 40 22 40 41 40 40 40 ...
##  $ partner_id        : int  145 125 145 125 202 145 245 125 125 145 ...
##  $ posted_time       : POSIXct, format: "2015-01-01 02:38:53" "2015-01-01 15:57:25" ...
##  $ funded_time       : POSIXct, format: "2015-01-01 16:20:50" "2015-01-05 00:21:31" ...
##  $ term_in_months    : int  8 10 8 20 11 8 14 12 12 5 ...
##  $ lender_count      : int  14 5 4 26 45 7 16 19 7 7 ...
##  $ repayment_interval: Factor w/ 3 levels "bullet","irregular",..: 2 2 2 2 1 2 3 2 2 2 ...

head(dataset)

4 Answering Business Question

What are the top 3 sectors that most often make loans to kiva ? (Top 3 sector apa yang paling sering melakukan pinjaman kepada kiva ?)

most_sector <- as.data.frame(table(dataset$sector))
most_sector <- most_sector[order(most_sector$Freq, decreasing = T),]
most_sector

most_sector[1:3,]

Answer : Top 3 sector are Agriculture, Food, and Retail

The amount of loan money given by kiva to agents in kenya for the type of repayment_interval monthly based on the sector? (Jumlah uang pinjaman yang diberikan kiva kepada agen di kenya untuk jenis jangka waktu pengembalian monthly berdasarkan sectornya?)

kenya <- dataset[dataset$country == "Kenya",]

sum_of_funded <- aggregate(funded_amount ~ sector + repayment_interval,
                                  kenya[kenya$repayment_interval == "monthly",], 
                                  sum)
sum_of_funded <- sum_of_funded[order(sum_of_funded$funded_amount,
                                                   decreasing = T),]
sum_of_funded

sum(sum_of_funded$funded_amount)

## [1] 5073800

library(ggplot2)
ggplot(data = sum_of_funded, 
       mapping = aes(x = funded_amount,y = reorder(sector, funded_amount)))+
  geom_col(mapping = aes(fill = funded_amount), show.legend = F)+
   geom_col(fill = "darkgoldenrod2", data = sum_of_funded[1:3,])+
  scale_fill_gradient(low = "tan4", high = "cornsilk")+
  labs(title = "Funded Amount of Kiva to Agent in Kenya", 
       subtitle = "based on sector", 
       x = "Funded Amount",
       y= NULL)

Insight:

The largest amount of loans was given to the Agriculture sector
The least amount of loans is given to the Wholesale sector

On average, how many days are lending in Kenya to the retail sector for each repayment_interval? (rata - rata berapa hari pemberian pinjaman di negara kenya pada sector retail untuk setiap repayment_interval?)

kenya$funding_days <- difftime(kenya$funded_time, kenya$posted_time,
                               units = "days")
head(kenya)

kenya_agg <- aggregate(funding_days ~ repayment_interval + sector, 
                       kenya,
                       mean)
kenya_agg <- kenya_agg[order(kenya_agg$funding_days, decreasing = T),]
kenya_agg

ggplot(data = kenya_agg, mapping = aes(x = funding_days, y = sector ))+
  geom_col(aes(fill = repayment_interval), position = "dodge")+
  labs(title = "Average Funding Duration on Each Repyament Interval and Sector", 
       x = "Funded Amount",
       y= NULL)

In what month does Kiva serve the largest number of loans in the Agriculture sector? (pada bulan apa kiva melayani pemberian pinjaman disector Agriculture terbanyak?)

dataset$month <- month(dataset$posted_time, label = T)
head(dataset)

agg <- as.data.frame(table(dataset$sector, dataset$month))
agg <- agg[agg$Var1 == "Agriculture",]
agg <- agg[order(agg$Freq, decreasing = T),]
agg

Answer : Kiva serve the largest number of loans in the Agriculture sector was in March

On month and repayment_interval what is the fastest financing duration and in which country? (pada bulan dan jangka waktu pengambilan apa durasi pembiayaan tercepat dan di negara mana?)

dataset$funding_duration <- difftime(dataset$funded_time,dataset$posted_time,
                                     units = "hours")
head(dataset)

loan_agg <- aggregate(funding_duration ~ month + repayment_interval + country,
                      dataset, 
                      mean)
loan_agg <- loan_agg[order(loan_agg$funding_duration, decreasing = F),]
head(loan_agg)

Based on the results of the analysis where the fastest funding_duration is in the country of Lesotho, it will be seen that the trending funding_duration is in Lesotho.

lesotho <- dataset[dataset$country == "Lesotho",]
head(lesotho)

leshoto_agg <- aggregate(funding_duration ~ month + repayment_interval, lesotho,
                         mean)
lesotho_agg <- lesotho[order(lesotho$funding_duration, decreasing = T),]
head(lesotho_agg)

ggplot(leshoto_agg, aes(x =month, y = funding_duration, 
                        color = repayment_interval,
                        group = repayment_interval))+
  geom_line()+
  geom_point()+
   labs(title = "Funding Duration Trend on Lesotho",
       x = NULL, y = "Funding Duration",
       color = "Repayment Interval")+
   theme_minimal()+
   theme(legend.position = "top")

what is the sum of loan_amount for each sector in all countries? (berapa jumlah loan_amount untuk setiap sektor di semua negara?)

jml_loanamount <- as.data.frame(xtabs(formula = loan_amount ~ sector, 
                                      data = dataset))
jml_loanamount <- jml_loanamount[order(jml_loanamount$Freq, 
                                       decreasing = T),]
jml_loanamount

ggplot(jml_loanamount, 
       aes(x = reorder(sector, Freq),
           y=Freq))+
  geom_col()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Jumlah loan_amount untuk setiap sector",
        x = "Sector")

Top 5 customers based on the largest loan amount in all countries (top 5 customer berdasarkan jumlah pinjaman terbesar di semua negara)

top5_id <- as.data.frame(aggregate(loan_amount ~ id, dataset, sum))
top5_id <- top5_id[order(top5_id$loan_amount, decreasing = T),]
top5_id[1:5,]

Top 10 activities that make loans above 500 dollars in the food sector but have a small number of lenders? (Top 10 aktifitas yang melakukan pinjaman diatas 500 dollar di sektor food namun memiliki jumlah pemberi pinjaman yang sedikit?)

food <- dataset[(dataset$sector == "Food")&(dataset$loan_amount > 500),]
food <- food[order(food$lender_count),]
food[1:10,c("activity","loan_amount","lender_count")]

How does the comparison of duration funding in 3 countries (Pakistan, Mexico, and Guatemala) based on repayment_interval? (Bagaimana perbandingan funding duration di 3 negara (Pakistan, Mexico, dan Guatemala) berdasarkan repayment_interval?)

negara <- c("Pakistan", "Mexico", "Guatemala")
funding_trend <- aggregate(funding_duration ~ country + repayment_interval + month,
                           dataset[dataset$country %in% negara,], 
                           mean)
funding_trend

ggplot(data = funding_trend, mapping = aes(x = month, y = funding_duration))+
  geom_line(aes(group = repayment_interval, col = repayment_interval))+
  geom_point(aes(group = repayment_interval, col = repayment_interval))+
  facet_wrap(~country, nrow = 3)+
  labs(title = "Trending funding_duration based on repayment_interval",
       subtitle = "In Guatemala, Mexico, Pakistan",
       x = NULL,
       y = "Funding Duration (in hours)")

insight

In Pakistan more often use repayment_interval in the irregullar and monthly types, while borrowing with repayment_interval bullet only occurs in June and July
In Pakistan the average funding_duration for irregular and monthly is not more than 600 hours
In mexico, the repayment_interval of type bullet is always faster than monthly
As of March, the funding_duration of each repayment_interval in Mexico is almost the same
In July, the funding_duration of the repayment_interval is irregular and monthly in Mexico.
in Guatemala, the repayment_interval is irregular, in terms of disbursement of funds, it is always faster than the repayment_interval of the monthly type
In May, funding_duration between repayment_interval of type bullet and monthly in guatemala is almost the same
The longest disbursement of funds in Guatemala occurred in July with a monthly repayment_interval

How much is the total farming activity loan in Bolivia in 2015? (Berapa total pinjaman activitas farming di bolivia pada tahun 2015)

bolivia <- dataset[(dataset$country == "Bolivia")&(dataset$activity == "Farming"),]
sum(bolivia$loan_amount)

## [1] 83975

Answer : Total loan in farming activity at Bolivia in 2015 is 83.975 USD

5 Conclusion

From the data, it is found that the Agriculture, Food, and Retail sectors are the top 3 sectors that often make loans to Kiva. But now we are interested in seeing Kiva’s customers in Kenya countries. Based on Kiva’s data, it serves loans of up to around 5.073.800 USD with the most borrowing sector is Agriculture and the average lending is around 0-22 days. Loans in the Agriculture sector are often given in March. In contrast to Kenya, the average lending rate in Lesotho countries is the fastest given by Kiva, which is about 4 hours. However, after examining it, it turns out that in a Lesotho country, Kiva only serves repayment_interval of the monthly type.

That is a little insight that can be obtained from the kiva dataset, there are many more insights that can be taken from the kiva dataset, hopefully this article can help readers a little. happy learning everyone ^^

Data Exploration on Kiva Dataset

Ika Sulistiorini

27 March, 2021