Based on Wikipedia, Kiva (commonly known by its domain name, Kiva.org) is a 501(c)(3) non-profit organization headquartered in San Francisco, California that allows people to lend money via the Internet to low-income entrepreneurs and students in 77 countries. Kiva’s mission is “to expand financial access to help underserved communities thrive.”
Since 2005, Kiva has crowd-funded more than 1.6 million loans, totaling over $1.33 billion, with a repayment rate of 95.8 percent. Over 1.8 million lenders worldwide use the Kiva platform. Lenders do not receive interest on the money they lend. Kiva includes personal stories of each person who applies for a loan so that lenders can connect with borrowers on a human level.
Kiva relies on a network of field partners to administer the loans on the ground. These field partners can be microfinance institutions, social impact businesses, schools or non-profit organizations. Borrowers pay interest on most loans to the field partners, and the field partners are charged small fees by Kiva. Kiva is supported by grants, loans, and donations from its users, corporations, and national institutions.
The kiva dataset contains loan data from several kiva customers in several countries. With this data we will do an exploratory data analysis (EDA) and a little visualization based on some of the business questions that have been made. before doing EDA, it would be better if we recognize our data first. The following is a list of column names along with their explanations that can make it easier for us to analyze the data.
Here are the list of column names and their explanation.
id: Unique ID for the loan (Loan ID)funded_amount: The amount disbursed by Kiva to the field agent (USD)loan_amount: The amount disbursed by the field agent to the borrower (USD)activity: Activity type as a more specific categorysector: High-level category of the loancountry: Full country name of the country in which loan was disbursedregion: Full region name within the countrycurrency: The currency in which the loan was disbursedpartner_id: ID of the partner organizationposted_time: The time at which the loan is posted on Kiva by the field agentfunded_time: The time at which the loan posted to Kiva gets funded by lenders completelyterm_in_months: The duration for which the loan was disbursed in monthslender_count: The total number of lenders that contributed to each loanrepayment_interval: Interval for the repayment of the loanThe first thing we do is input the loan_kiva.csv data that we saved. using the read_csv () function. and will be saved with the name dataset so that the following dataframe is obtained
dataset <- read.csv("loan_kiva.csv")
head(dataset)We will do checking data and transform data.
In this section we will check whether our data contains missing values or not, and check the data type in each column. This step is important because if the data still contains missing values or the data type is not suitable, it will make it difficult for us to perform operations on the dataframe that we have created.
colSums(is.na(dataset))## id funded_amount loan_amount activity
## 0 0 0 0
## sector country region currency
## 0 0 0 0
## partner_id posted_time funded_time term_in_months
## 0 0 0 0
## lender_count repayment_interval
## 0 0
It means that our dataset does not have any missing values.The addition, I want to check the summary and str from dataset so I know the statistic of each columns and I can guess is the data type correct or not.
summary() to check the summarysummary(dataset)## id funded_amount loan_amount activity
## Min. : 823031 Min. : 25.0 Min. : 25.0 Length:165040
## 1st Qu.: 867063 1st Qu.: 275.0 1st Qu.: 275.0 Class :character
## Median : 913180 Median : 475.0 Median : 475.0 Mode :character
## Mean : 913167 Mean : 806.9 Mean : 806.9
## 3rd Qu.: 958990 3rd Qu.: 950.0 3rd Qu.: 950.0
## Max. :1002884 Max. :50000.0 Max. :50000.0
## sector country region currency
## Length:165040 Length:165040 Length:165040 Length:165040
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## partner_id posted_time funded_time term_in_months
## Min. : 9.0 Length:165040 Length:165040 Min. : 2.00
## 1st Qu.:126.0 Class :character Class :character 1st Qu.: 8.00
## Median :145.0 Mode :character Mode :character Median : 12.00
## Mean :172.9 Mean : 13.43
## 3rd Qu.:201.0 3rd Qu.: 14.00
## Max. :469.0 Max. :158.00
## lender_count repayment_interval
## Min. : 1.00 Length:165040
## 1st Qu.: 8.00 Class :character
## Median : 14.00 Mode :character
## Mean : 21.63
## 3rd Qu.: 26.00
## Max. :1605.00
str() to check the structure of dataframestr(dataset)## 'data.frame': 165040 obs. of 14 variables:
## $ id : int 823048 823138 823051 823143 823154 823117 823100 823132 823137 823050 ...
## $ funded_amount : int 450 200 175 675 1450 250 450 900 225 250 ...
## $ loan_amount : int 450 200 175 675 1450 250 450 900 225 250 ...
## $ activity : chr "Butcher Shop" "Food Production/Sales" "Fruits & Vegetables" "Fishing" ...
## $ sector : chr "Food" "Food" "Food" "Food" ...
## $ country : chr "Philippines" "Philippines" "Philippines" "Philippines" ...
## $ region : chr "Kabankalan, Negros Occidental" "Baybay, Leyte" "Kabankalan, Negros Occidental" "Carles, Iloilo" ...
## $ currency : chr "PHP" "PHP" "PHP" "PHP" ...
## $ partner_id : int 145 125 145 125 202 145 245 125 125 145 ...
## $ posted_time : chr "2015-01-01 02:38:53" "2015-01-01 15:57:25" "2015-01-01 02:49:11" "2015-01-01 16:20:43" ...
## $ funded_time : chr "2015-01-01 16:20:50" "2015-01-05 00:21:31" "2015-01-01 06:12:18" "2015-01-05 19:21:50" ...
## $ term_in_months : int 8 10 8 20 11 8 14 12 12 5 ...
## $ lender_count : int 14 5 4 26 45 7 16 19 7 7 ...
## $ repayment_interval: chr "irregular" "irregular" "irregular" "irregular" ...
From the previous step, we know that the data not have missing value but there are several columns with the wrong data type. So in this part we do Transforming data by changing the wrong data type to correct ones.
factor using as.factordataset[,c("sector","country","region","currency","repayment_interval")] <-
lapply(dataset[,c("sector","country","region","currency","repayment_interval")],
as.factor)date using package lubridate Before that, we must call the lubridate with function library()library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, unionand now we can use the package
dataset$posted_time <- ymd_hms(dataset$posted_time)
dataset$funded_time <- ymd_hms(dataset$funded_time)Let’s check data structure again to see that the data type have already changed.
str(dataset)## 'data.frame': 165040 obs. of 14 variables:
## $ id : int 823048 823138 823051 823143 823154 823117 823100 823132 823137 823050 ...
## $ funded_amount : int 450 200 175 675 1450 250 450 900 225 250 ...
## $ loan_amount : int 450 200 175 675 1450 250 450 900 225 250 ...
## $ activity : chr "Butcher Shop" "Food Production/Sales" "Fruits & Vegetables" "Fishing" ...
## $ sector : Factor w/ 15 levels "Agriculture",..: 7 7 7 7 1 7 12 1 7 7 ...
## $ country : Factor w/ 77 levels "Albania","Armenia",..: 53 53 53 53 29 53 48 53 53 53 ...
## $ region : Factor w/ 6374 levels "","\"\"The first May\"\" village",..: 2356 686 2356 1118 6233 5626 2257 686 686 2356 ...
## $ currency : Factor w/ 62 levels "ALL","AMD","AZN",..: 40 40 40 40 22 40 41 40 40 40 ...
## $ partner_id : int 145 125 145 125 202 145 245 125 125 145 ...
## $ posted_time : POSIXct, format: "2015-01-01 02:38:53" "2015-01-01 15:57:25" ...
## $ funded_time : POSIXct, format: "2015-01-01 16:20:50" "2015-01-05 00:21:31" ...
## $ term_in_months : int 8 10 8 20 11 8 14 12 12 5 ...
## $ lender_count : int 14 5 4 26 45 7 16 19 7 7 ...
## $ repayment_interval: Factor w/ 3 levels "bullet","irregular",..: 2 2 2 2 1 2 3 2 2 2 ...
head(dataset)most_sector <- as.data.frame(table(dataset$sector))
most_sector <- most_sector[order(most_sector$Freq, decreasing = T),]
most_sectormost_sector[1:3,]Answer : Top 3 sector are Agriculture, Food, and Retail
monthly based on the sector? (Jumlah uang pinjaman yang diberikan kiva kepada agen di kenya untuk jenis jangka waktu pengembalian monthly berdasarkan sectornya?)kenya <- dataset[dataset$country == "Kenya",]sum_of_funded <- aggregate(funded_amount ~ sector + repayment_interval,
kenya[kenya$repayment_interval == "monthly",],
sum)
sum_of_funded <- sum_of_funded[order(sum_of_funded$funded_amount,
decreasing = T),]
sum_of_fundedsum(sum_of_funded$funded_amount)## [1] 5073800
library(ggplot2)
ggplot(data = sum_of_funded,
mapping = aes(x = funded_amount,y = reorder(sector, funded_amount)))+
geom_col(mapping = aes(fill = funded_amount), show.legend = F)+
geom_col(fill = "darkgoldenrod2", data = sum_of_funded[1:3,])+
scale_fill_gradient(low = "tan4", high = "cornsilk")+
labs(title = "Funded Amount of Kiva to Agent in Kenya",
subtitle = "based on sector",
x = "Funded Amount",
y= NULL)Insight:
retail untuk setiap repayment_interval?)kenya$funding_days <- difftime(kenya$funded_time, kenya$posted_time,
units = "days")
head(kenya)kenya_agg <- aggregate(funding_days ~ repayment_interval + sector,
kenya,
mean)
kenya_agg <- kenya_agg[order(kenya_agg$funding_days, decreasing = T),]
kenya_aggggplot(data = kenya_agg, mapping = aes(x = funding_days, y = sector ))+
geom_col(aes(fill = repayment_interval), position = "dodge")+
labs(title = "Average Funding Duration on Each Repyament Interval and Sector",
x = "Funded Amount",
y= NULL)Agriculture sector? (pada bulan apa kiva melayani pemberian pinjaman disector Agriculture terbanyak?)dataset$month <- month(dataset$posted_time, label = T)
head(dataset)agg <- as.data.frame(table(dataset$sector, dataset$month))
agg <- agg[agg$Var1 == "Agriculture",]
agg <- agg[order(agg$Freq, decreasing = T),]
aggAnswer : Kiva serve the largest number of loans in the Agriculture sector was in March
month and repayment_interval what is the fastest financing duration and in which country? (pada bulan dan jangka waktu pengambilan apa durasi pembiayaan tercepat dan di negara mana?)dataset$funding_duration <- difftime(dataset$funded_time,dataset$posted_time,
units = "hours")
head(dataset)loan_agg <- aggregate(funding_duration ~ month + repayment_interval + country,
dataset,
mean)
loan_agg <- loan_agg[order(loan_agg$funding_duration, decreasing = F),]
head(loan_agg)Based on the results of the analysis where the fastest funding_duration is in the country of Lesotho, it will be seen that the trending funding_duration is in Lesotho.
lesotho <- dataset[dataset$country == "Lesotho",]
head(lesotho)leshoto_agg <- aggregate(funding_duration ~ month + repayment_interval, lesotho,
mean)
lesotho_agg <- lesotho[order(lesotho$funding_duration, decreasing = T),]
head(lesotho_agg)ggplot(leshoto_agg, aes(x =month, y = funding_duration,
color = repayment_interval,
group = repayment_interval))+
geom_line()+
geom_point()+
labs(title = "Funding Duration Trend on Lesotho",
x = NULL, y = "Funding Duration",
color = "Repayment Interval")+
theme_minimal()+
theme(legend.position = "top")jml_loanamount <- as.data.frame(xtabs(formula = loan_amount ~ sector,
data = dataset))
jml_loanamount <- jml_loanamount[order(jml_loanamount$Freq,
decreasing = T),]
jml_loanamountggplot(jml_loanamount,
aes(x = reorder(sector, Freq),
y=Freq))+
geom_col()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs(title = "Jumlah loan_amount untuk setiap sector",
x = "Sector")top5_id <- as.data.frame(aggregate(loan_amount ~ id, dataset, sum))
top5_id <- top5_id[order(top5_id$loan_amount, decreasing = T),]
top5_id[1:5,]food <- dataset[(dataset$sector == "Food")&(dataset$loan_amount > 500),]
food <- food[order(food$lender_count),]
food[1:10,c("activity","loan_amount","lender_count")]negara <- c("Pakistan", "Mexico", "Guatemala")
funding_trend <- aggregate(funding_duration ~ country + repayment_interval + month,
dataset[dataset$country %in% negara,],
mean)
funding_trendggplot(data = funding_trend, mapping = aes(x = month, y = funding_duration))+
geom_line(aes(group = repayment_interval, col = repayment_interval))+
geom_point(aes(group = repayment_interval, col = repayment_interval))+
facet_wrap(~country, nrow = 3)+
labs(title = "Trending funding_duration based on repayment_interval",
subtitle = "In Guatemala, Mexico, Pakistan",
x = NULL,
y = "Funding Duration (in hours)")insight
bolivia <- dataset[(dataset$country == "Bolivia")&(dataset$activity == "Farming"),]
sum(bolivia$loan_amount)## [1] 83975
Answer : Total loan in farming activity at Bolivia in 2015 is 83.975 USD
Since 2005, Kiva has crowd-funded more than 1.6 million loans, totaling over $1.33 billion, with a repayment rate of 95.8 percent. Over 1.8 million lenders worldwide use the Kiva platform. Lenders do not receive interest on the money they lend. Kiva includes personal stories of each person who applies for a loan so that lenders can connect with borrowers on a human level.
From the data, it is found that the Agriculture, Food, and Retail sectors are the top 3 sectors that often make loans to Kiva. But now we are interested in seeing Kiva’s customers in Kenya countries. Based on Kiva’s data, it serves loans of up to around 5.073.800 USD with the most borrowing sector is Agriculture and the average lending is around 0-22 days. Loans in the Agriculture sector are often given in March. In contrast to Kenya, the average lending rate in Lesotho countries is the fastest given by Kiva, which is about 4 hours. However, after examining it, it turns out that in a Lesotho country, Kiva only serves repayment_interval of the monthly type.
That is a little insight that can be obtained from the kiva dataset, there are many more insights that can be taken from the kiva dataset, hopefully this article can help readers a little. happy learning everyone ^^