Kiva Loans

Kiva loans are extremely small loans, called microloans, made to entrepreneurs who need small seed loans to start their businesses. The loans are made in order to help better communities one entrepreneur at a time. The dataset used in this vignette consists of a set of Kiva loans made in calendar year 2016 around the globe. For the purpose of this vignette, the loans data was pared down to make the file size < 25 MB.

kiva <- read.csv("https://raw.githubusercontent.com/douglasbarley/FALL2020TIDYVERSE/TidyverseVignette/kiva_loans.csv")

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'readr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'stringr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
glimpse(kiva)
## Rows: 197,236
## Columns: 14
## $ id                 <int> 1002924, 1002908, 1002897, 1002916, 1002891, 100...
## $ funded_amount      <int> 500, 500, 500, 500, 575, 1600, 500, 300, 300, 62...
## $ loan_amount        <int> 500, 500, 500, 500, 575, 1600, 500, 300, 300, 62...
## $ activity           <fct> Rickshaw, Rickshaw, Fruits & Vegetables, Clothin...
## $ sector             <fct> Transportation, Transportation, Food, Clothing, ...
## $ country_code       <fct> PK, PK, PK, PK, PK, KG, PK, PK, PK, UA, PK, MG, ...
## $ country            <fct> Pakistan, Pakistan, Pakistan, Pakistan, Pakistan...
## $ region             <fct> "Multan", "Lahore", "Multan", "Lahore", "Lahore"...
## $ partner_id         <int> 247, 247, 247, 247, 247, 171, 247, 247, 247, 26,...
## $ term_in_months     <int> 14, 11, 15, 11, 11, 14, 11, 11, 12, 26, 14, 6, 1...
## $ lender_count       <int> 1, 1, 17, 16, 21, 64, 20, 1, 12, 23, 12, 3, 18, ...
## $ borrower_genders   <fct> female, female, female, female, female, female, ...
## $ repayment_interval <fct> monthly, irregular, monthly, irregular, irregula...
## $ date               <fct> 1/1/2016, 1/1/2016, 1/1/2016, 1/1/2016, 1/1/2016...

The 2016 data includes 197,236 observations of 14 variables.

Tidyverse group_by() function

The Tidyverse contains many packages that are useful in R for cleaning and exploring data. When faced with a fairly long dataset, such as the Kiva set in this example, it is useful to be able to count the data in a single column while grouping the counts according to discrete values in that column. The group_by function in the dplyr corner of the Tidyverse helps to do just that. This helps a programmer quickly explore what is in the data.

For example, it could be useful to know which countries received the most loans.

countries <- data.frame(kiva) %>%
  group_by(country) %>%
      summarize(count_loans = n())

head(countries)
## # A tibble: 6 x 2
##   country     count_loans
##   <fct>             <int>
## 1 Afghanistan           1
## 2 Albania             476
## 3 Armenia            2987
## 4 Azerbaijan          303
## 5 Belize               23
## 6 Bolivia            2488

Visualizing the results

Once we have a concise count of loans by country, it is helpful to be able to visualize all of the results in a single graphic. The ggplot() function, also part of the Tidyverse, is very helpful in the visualization realm.

ggplot(data = countries) + geom_col(aes(x = country, y = count_loans)) +
  ggtitle("Loans Disbursed by Country") +
  coord_flip() +  
  ylab('Loan Count') +
  xlab('Country') 

There are so many countries where loans were disbursed that it is difficult to read each country’s name. In order to simplify the listing and visualizations, let’s identify the top 10 countries that received loans.

countries_top10 <- head(arrange(countries,desc(count_loans)), n = 10)
countries_top10
## # A tibble: 10 x 2
##    country     count_loans
##    <fct>             <int>
##  1 Philippines       48317
##  2 Kenya             20604
##  3 Cambodia          11590
##  4 El Salvador        9454
##  5 Pakistan           8777
##  6 Colombia           7170
##  7 Tajikistan         6318
##  8 Peru               6215
##  9 Ecuador            5038
## 10 Uganda             4524

Now we can graph the top 10 countries that received loans.

ggplot(data = countries_top10) + geom_col(aes(x = reorder(country, count_loans), count_loans)) +
  ggtitle("Loans Disbursed by Country") +
  coord_flip() +  
  ylab('Loan Count') +
  xlab('Country')

That’s much more legible! Now we can see that the Philippines received the most Kiva loans of any country in 2016.

Extending Doug’s example to further analyze which sectors and activity within sectors received the most funding.

sector <- data.frame(kiva) %>%
  group_by(sector) %>%
      summarize(count_loans = n())
## `summarise()` ungrouping output (override with `.groups` argument)
head(sector)
## # A tibble: 6 x 2
##   sector        count_loans
##   <fct>               <int>
## 1 Agriculture         52647
## 2 Arts                 3909
## 3 Clothing             8957
## 4 Construction         1648
## 5 Education            9959
## 6 Entertainment         245
activity <- data.frame(kiva) %>%
  group_by(activity) %>%
      summarize(count_loans = n())
## `summarise()` ungrouping output (override with `.groups` argument)
head(activity)
## # A tibble: 6 x 2
##   activity         count_loans
##   <fct>                  <int>
## 1 Agriculture             6033
## 2 Air Conditioning          12
## 3 Animal Sales            2578
## 4 Arts                     344
## 5 Auto Repair              383
## 6 Bakery                  1066

Visualizing the results

Once we have a concise count of loans by sector, it is helpful to be able to visualize all of the results in a single graphic. The ggplot() function, also part of the Tidyverse, is very helpful in the visualization realm.

ggplot(data = sector) + geom_col(aes(x = reorder(sector, count_loans), y = count_loans)) +
  ggtitle("Loans Disbursed by Sector") +
  coord_flip() +  
  ylab('Loan Count') +
  xlab('Sector') 

ggplot(data = activity) + geom_col(aes(x = reorder(activity, count_loans), y = count_loans)) +
  ggtitle("Loans Disbursed by Activity") +
  coord_flip() +  
  ylab('Loan Count') +
  xlab('Activity') 

activity_top10 <- head(arrange(activity,desc(count_loans)), n = 10)
activity_top10
## # A tibble: 10 x 2
##    activity                  count_loans
##    <fct>                           <int>
##  1 Farming                         21967
##  2 General Store                   19206
##  3 Pigs                             8915
##  4 Personal Housing Expenses        8896
##  5 Home Appliances                  8531
##  6 Food Production/Sales            8422
##  7 Clothing Sales                   6484
##  8 Agriculture                      6033
##  9 Retail                           5938
## 10 Higher education costs           5726
ggplot(data = activity_top10) + geom_col(aes(x = reorder(activity, count_loans), count_loans)) +
  ggtitle("Loans Disbursed by Activity") +
  coord_flip() +  
  ylab('Loan Count') +
  xlab('Activity')

Conclusion: Agriculture received the most Kiva loans, this makes sense since agriculture is a big driver of the economies for several of these developing countries.Specifically, farming as defined in the activity column. Similarly, Retail was the second largest receipent of Kiva loans and looking closely that is mostly General Store activity.