DATA 606 01[15958] : Final Project [05/15]

1 Part 1 - Introduction

Kiva Microfunds (commonly known by its domain name, Kiva.org) is a non-profit organization that allows people to lend money via the Internet to low-income entrepreneurs and students in over 80 countries. Kiva’s mission is “to connect people through lending to alleviate poverty.

We will use the Kaggle Dataset Multidimensional Poverty Measures to investigate the poverty in the world. We would connect these measures with the Kiva loans for a more complete understanding and explore the different metrics of the Multidimensional Poverty Measures

Multidimensional poverty measures can be used to create a more comprehensive picture. They reveal who is poor and how they are poor - the range of different disadvantages they experience.Higher the MPI, poorer is the country

From Wiki

The Human Development Index (HDI) is a composite statistic (composite index) of life expectancy, education, and per capita income indicators, which are used to rank countries into four tiers of human development. A country scores higher HDI when the lifespan is higher, the education level is higher, and the GDP per capita is higher.

Kiva should direct the loans to required countries and regions that would improve the The Human Development Index

2 Part 2 - Data Preparation

2.1 Load Libraries

  • readr
  • tidyverse
  • stringr
  • lubridate
  • DT
  • leaflet
  • knitr
  • treemap
  • caret
  • data.table
  • statsr
  • broom
libraries_used <- c("readr", "tidyverse", "stringr", "lubridate", "DT", "leaflet", "knitr", "treemap", "caret", "forecast", "prophet", "data.table", "broom", "statsr")

# check missing libraries
libraries_missing <- libraries_used[!(libraries_used %in% installed.packages()[,"Package"])]
# install missing libraries
if(length(libraries_missing)) install.packages(libraries_missing, repos = "http://cran.us.r-project.org")

library(readr)      # read data
library(tidyverse)  # data manipulation and graphs
library(stringr)    # string manipulation
library(lubridate)  # date manipulation
library(DT)         # table format display of data
library(leaflet)    # maps
library(knitr)
library(treemap)
library(caret)
library(forecast)
library(prophet)
library(data.table)
library(broom)
library(statsr)

load("nc.Rdata")

2.2 Data collection

Download and load data from Kaggle

rm(list=ls())

fillColor = "#FFA07A"
fillColor2 = "#F1C40F"
fillColorLightCoral = "#F08080"

loans <- readr::read_csv('kiva_loans.csv')
regions <- readr::read_csv("kiva_mpi_region_locations.csv")
themes <- readr::read_csv("loan_theme_ids.csv")
themes_region <- readr::read_csv("loan_themes_by_region.csv")

mpi_national = readr::read_csv('MPI_national.csv')
mpi_subnational = readr::read_csv('MPI_subnational.csv')

country_stats <- readr::read_csv("country_stats.csv")
GEconV4 = readr::read_delim(file = "GEconV4.csv",delim=";")

countries_continents = readr::read_csv('countries and continents.csv')

ConflictsData =  readr::read_csv("african_conflicts.csv", 
                                col_types = readr::cols(.default = readr::col_character(),
                                                        FATALITIES = readr::col_integer(),
                                                        GEO_PRECISION = readr::col_integer(),
                                                        GWNO = readr::col_integer(),
                                                        INTER1 = readr::col_integer(),
                                                        INTER2 = readr::col_integer(),
                                                        INTERACTION = readr::col_integer(),
                                                        LATITUDE = readr::col_character(),
                                                        LONGITUDE = readr::col_character(),
                                                        TIME_PRECISION = readr::col_integer(),
                                                        YEAR = readr::col_integer()
                                                      ))

ConflictsData$LATITUDE[!grepl("^[0-9.]+$", ConflictsData$LATITUDE)] <- NA
ConflictsData$LONGITUDE[!grepl("^[0-9.]+$", ConflictsData$LONGITUDE)] <- NA
 
ConflictsData$LATITUDE = as.numeric(as.character(ConflictsData$LATITUDE))
ConflictsData$LONGITUDE = as.numeric(as.character(ConflictsData$LONGITUDE))

loans$use = trimws(loans$use)

2.3 Cases

This dataset is an Observational study. Consumers from different countries have requested for loan or charitable donation in Kiva. Each row will contain borrowers information and their current residence along-with repayment staus.

For this project, in some cases, I have used the comple dataset and for some analysis I have taken a simple random sample of 10000 rows.

2.4 Variables

It has the funded amount, loan amount, activity and about 20 variables. Also there are around 670K observations for Kiva loans.

But for this current analysis, below are the variables used.

Kiva Loans Description
loan_amnt The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
funded_amnt The total amount committed to that loan at that point in time.
country Country where the Kiva loan was granted
region Specific region/city in the country where the Kiva loan was granted
term The number of payments on the loan. Values are in months and can be either 1 to 160

2.5 Type of study

This is an observational study. We will arrive at conclusion by performing below tests on the mentioned variables.

  1. Hypothesis Test - Reasoning whether the inference is just by chance.
  2. F-Test - Compare multiple variables
  3. Create Linear regression - Form the regression line with the available parameters. Check the values between predicted and observed outcome.
  4. Create logarithmic regression - Create an model for non-linear data varaibles.

2.6 Scope of inference

  1. Generalizability: Population of Kiva Loans for this study is applicable globally but specifically we have studies Africa, Asia and Americas alongwith one country from each of these continents namely Kenya, India and El Salvador. To borrow the loan from banks, it requires credit score info, personal info like home ownership, purpose of loan, employment length, annual income.

It may not be applicable for the Kiva Loans.

  1. Bias: Here the bias that prevents the generalizability is the borrower information. Only the person who has knowledge about Kiva, is requesting for a loan in Kiva. Bank might use another co-founding variable to get the interest rate. Often, countries with bad credit score get a higher interest loan due to low graded loans - risk of being written off.

2.7 Causality

As this is an observational study we cannot derive any causal connections between the variables.

2.7.1 Glimpse of Data

2.7.1.1 Loans data

Loans data contains data about some of Kiva’s loans. It is a subset of Kiva’s data snapshots

tibble::glimpse(loans)
## Observations: 671,205
## Variables: 20
## $ id                 <dbl> 653051, 653053, 653068, 653063, 653084, 108...
## $ funded_amount      <dbl> 300, 575, 150, 200, 400, 250, 200, 400, 475...
## $ loan_amount        <dbl> 300, 575, 150, 200, 400, 250, 200, 400, 475...
## $ activity           <chr> "Fruits & Vegetables", "Rickshaw", "Transpo...
## $ sector             <chr> "Food", "Transportation", "Transportation",...
## $ use                <chr> "To buy seasonal, fresh fruits to sell.", "...
## $ country_code       <chr> "PK", "PK", "IN", "PK", "PK", "KE", "IN", "...
## $ country            <chr> "Pakistan", "Pakistan", "India", "Pakistan"...
## $ region             <chr> "Lahore", "Lahore", "Maynaguri", "Lahore", ...
## $ currency           <chr> "PKR", "PKR", "INR", "PKR", "PKR", "KES", "...
## $ partner_id         <dbl> 247, 247, 334, 247, 245, NA, 334, 245, 245,...
## $ posted_time        <dttm> 2014-01-01 06:12:39, 2014-01-01 06:51:08, ...
## $ disbursed_time     <dttm> 2013-12-17 08:00:00, 2013-12-17 08:00:00, ...
## $ funded_time        <dttm> 2014-01-02 10:06:32, 2014-01-02 09:17:23, ...
## $ term_in_months     <dbl> 12, 11, 43, 11, 14, 4, 43, 14, 14, 11, 11, ...
## $ lender_count       <dbl> 12, 14, 6, 8, 16, 6, 8, 8, 19, 24, 3, 16, 1...
## $ tags               <chr> NA, NA, "user_favorite, user_favorite", NA,...
## $ borrower_genders   <chr> "female", "female, female", "female", "fema...
## $ repayment_interval <chr> "irregular", "irregular", "bullet", "irregu...
## $ date               <date> 2014-01-01, 2014-01-01, 2014-01-01, 2014-0...

2.7.1.2 Regions data

This data contains Kiva’s estimates as to the geolocation of subnational MPI regions

DT::datatable(head(regions), style="bootstrap", class="table-condensed", options = list(dom = 'tp',scrollX = TRUE))

2.7.1.3 Themes data

This data contains Kiva’s loan themes

DT::datatable(head(themes), style="bootstrap", class="table-condensed", options = list(dom = 'tp',scrollX = TRUE))

2.7.1.4 Themes and Regions data

This data contains Kiva’s maps the loan themes with the Regions data

tibble::glimpse(themes_region)
## Observations: 15,736
## Variables: 21
## $ `Partner ID`         <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,...
## $ `Field Partner Name` <chr> "KREDIT Microfinance Institution", "KREDI...
## $ sector               <chr> "General Financial Inclusion", "General F...
## $ `Loan Theme ID`      <chr> "a1050000000slfi", "a10500000068jPe", "a1...
## $ `Loan Theme Type`    <chr> "Higher Education", "Vulnerable Populatio...
## $ country              <chr> "Cambodia", "Cambodia", "Cambodia", "Camb...
## $ forkiva              <chr> "No", "No", "No", "No", "No", "No", "No",...
## $ region               <chr> "Banteay Meanchey", "Battambang Province"...
## $ geocode_old          <chr> "(13.75, 103.0)", NA, NA, "(12.0, 105.5)"...
## $ ISO                  <chr> "KHM", "KHM", "KHM", "KHM", "KHM", "KHM",...
## $ number               <dbl> 1, 58, 7, 1383, 3, 36, 2, 249, 7, 18, 890...
## $ amount               <dbl> 450, 20275, 9150, 604950, 275, 62225, 130...
## $ LocationName         <chr> "Banteay Meanchey, Cambodia", "Battambang...
## $ geocode              <chr> "[(13.6672596, 102.8975098)]", "[(13.0286...
## $ names                <chr> "Banteay Meanchey Province; Cambodia", "B...
## $ geo                  <chr> "(13.6672596, 102.8975098)", "(13.0286971...
## $ lat                  <dbl> 13.66726, 13.02870, 13.02870, 12.09829, 1...
## $ lon                  <dbl> 102.8975, 102.9896, 102.9896, 105.3131, 1...
## $ mpi_region           <chr> "Banteay Mean Chey, Cambodia", "Banteay M...
## $ mpi_geo              <chr> "(13.6672596, 102.8975098)", "(13.6672596...
## $ rural_pct            <dbl> 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 9...

3 Part 3 - Exploratory Data Analysis

Below are some exploratory data analysis charts to understand more about the data.

3.1 Explanatory

Below table summarizes the question, response and explanatory variable. It also shows whether it is Numerical or Categorical.

Question Response Variable Explanatory Variable
1. What are Kiva loans used for? use (Categorical) sector (Categorical), activity (Categorical)
2. Popular Sector for Kiva loans for each continent and country? loan_amount (Numerical) sector (Categorical), purpose (Categorical)
3. Who funds Kiva Loans ? Field Partner Name (Categorical) amount (Numerical)
4. Countries that require the most loans to improve Human Development Index? MPI National (Numerical) -
5. Commonly request loan terms and gender ? term_in_months (Numerical) borrower_genders (Categorical)

3.5 Distribution of the Funded Loan amount

The funded loan amount is shown in the form of a histogram. The Y axis and the X axis has been log transformed for better visualization.

fundedLoanAmountDistribution <- function(loans)
{
  loans %>%
    ggplot(aes(x = funded_amount) )+
    scale_x_log10(
                  breaks = scales::trans_breaks("log10", function(x) 10^x),
                  labels = scales::trans_format("log10", scales::math_format(10^.x))
    ) +
    scale_y_log10(
                  breaks = scales::trans_breaks("log10", function(x) 10^x),
                  labels = scales::trans_format("log10", scales::math_format(10^.x))
    ) + 
    geom_histogram(fill = fillColor2,bins=50) +
    labs(x = 'Funded Loan Amount' ,y = 'Count', title = paste("Distribution of", "Funded Loan Amount")) +
    theme_bw()
}

fundedLoanAmountDistribution(loans)

3.5.1 Distribution by Country

loans_funded_amount = loans %>%
  group_by(country) %>%
  summarise(FundedAmount = sum(funded_amount)) %>%
  arrange(desc(FundedAmount)) %>%
  ungroup() %>%
  mutate(country = reorder(country,FundedAmount)) %>%
  head(20) 

treemap(loans_funded_amount, 
        index="country", 
        vSize = "FundedAmount",  
        title="Funded Amount", 
        palette = "RdBu",
        fontsize.title = 14)

  • Philippines, Kenya, Peru, Paraguay and ElSalvador get the maximum funding through Kiva loans
### Summary of Funded Amount
loans %>%
   select(funded_amount) %>%
   summary()

3.5.2 Distribution by Sector

The funded loan amount is shown sector wise below. The amount has been scaled by log10.

loans %>%
  mutate( fill = as.factor(sector)) %>%
      ggplot(aes(x = sector, y= funded_amount, fill = sector)) +
      scale_y_log10(
        breaks = scales::trans_breaks("log10", function(x) 10^x),
        labels = scales::trans_format("log10", scales::math_format(10^.x))
      ) +
  geom_boxplot() +
  labs(x= 'Sector Type',y = 'Funded Amount', 
       title = paste("Distribution of", ' Funded Amount ')) +
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

3.5.3 Distribution by Gender

loans %>%
  filter(!is.na(borrower_genders)) %>%
  mutate(gender = ifelse(str_detect(borrower_genders,"female"), "female","male")) %>%
  group_by(gender) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(gender = reorder(gender,Count)) %>%
  head(10) %>%
  ggplot(aes(x = gender,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor) +
  geom_text(aes(x = gender, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Gender', 
       y = 'Count', 
       title = 'Gender and Count') +
  coord_flip() +
  theme_bw()

  • As per above, Females loan more than males.

3.6 Common Loan Term In Months

loans %>%
  filter(!is.na(term_in_months)) %>%
  group_by(term_in_months) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(term_in_months = reorder(term_in_months,Count)) %>%
  head(10) %>%
  ggplot(aes(x = term_in_months,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = term_in_months, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Term in Months', 
       y = 'Count', 
       title = 'Term in Months and Count') +
  coord_flip() +
   theme_bw()

  • As per above, 14 months is the most common Term for the loans followed by 8, 11 ,7 and 13 months

3.8 Maps of Loans

We plot the loans in the word map with the size of the dots proportional to the amount of the loans

leaflet(themes_region) %>% addTiles() %>%
  addCircles(lng = ~lon, lat = ~lat,radius = ~(amount/10) ,
             color = fillColor2)  %>%
  # controls
  setView(lng=0, lat=0,zoom = 2) 

3.10 Kiva Loans in respective Continents

3.10.1 AFRICA

3.10.1.1 Loan Distribution

The funded loan amount is shown in the form of a histogram. The Y axis and the X axis has been log transformed for better visualization.

AfricanLoans <- regions %>%
                  select(country, world_region) %>%
                  unique() %>%
                  inner_join(loans) %>%
                  filter(str_detect(world_region,"Africa"))

fundedLoanAmountDistribution(AfricanLoans)

AfricanLoans %>%
   select(funded_amount) %>%
   summary()
##  funded_amount    
##  Min.   :    0.0  
##  1st Qu.:  200.0  
##  Median :  375.0  
##  Mean   :  660.6  
##  3rd Qu.:  725.0  
##  Max.   :50000.0

The plots show the African countries which has been given the most loans in the bar plot. This also shows the map of Africa with the loans appearing as dots in the map.

plotBarPlotLoansInGeography <- function(country_loans)
{
  country_loans %>%
    group_by(country) %>%
    summarise(Count = n()) %>%
    arrange(desc(Count)) %>%
    ungroup() %>%
    mutate(country = reorder(country,Count)) %>%
    head(10) %>%
    ggplot(aes(x = country,y = Count)) +
    geom_bar(stat='identity',colour="white", fill = fillColorLightCoral) +
    geom_text(aes(x = country, y = 1, label = paste0("(",Count,")",sep="")),
              hjust=0, vjust=.5, size = 4, colour = 'black',
              fontface = 'bold') +
    labs(x = 'Countries', 
         y = 'Count', 
         title = 'Countries and Count') +
    coord_flip() +
    theme_bw()
}

plotMapsLoansInGeography <- function(country_loans)
{
  center_lon = median(country_loans$lon,na.rm = TRUE)
  center_lat = median(country_loans$lat,na.rm = TRUE)

  leaflet(country_loans) %>% addTiles() %>%
    addCircles(lng = ~lon, lat = ~lat,radius = ~(amount/10) ,
               color =fillColor2)  %>%
    # controls
    setView(lng=center_lon, lat=center_lat,zoom = 3) 
}

country_loans = themes_region_combined %>%
  filter(str_detect(world_region,"Africa"))

unique(country_loans$world_region)
## [1] "Sub-Saharan Africa"
plotBarPlotLoansInGeography(country_loans)

plotMapsLoansInGeography(country_loans)
  • Kenya, Lesotho, Uganda, Malawi and Ghana are the countries which have got the most loans.

Observations

We observe from the sections Multidimensional Poverty Measures and Distribution of loans in Africa that we do not see loans in the poorest areas. This might be an opportunity for Kiva to help these very underpriviliged countries.

3.10.1.2 African Conflicts Data

We extend our analysis to the African Conflicts data found in Kaggle and map the most battle prone areas in 2016 and 2017. The intention is to highlight that these Battle prone areas may be in need of funds for very basic neccessities such as water and food

# Identify areas where there is Battle
keywordBattle = "Battle"

BattleData = ConflictsData %>% 
  filter(!is.na(LATITUDE)) %>%
  filter(!is.na(LONGITUDE)) %>%
  filter(str_detect(EVENT_TYPE,keywordBattle) )

BattleData20162017 = BattleData %>% filter(YEAR >= 2016)

center_lon = median(BattleData$LONGITUDE)
center_lat = median(BattleData$LATITUDE)

leaflet(BattleData20162017) %>% addTiles() %>%
  addCircles(lng = ~LONGITUDE, lat = ~LATITUDE,radius = ~(FATALITIES), 
             color = fillColor)  %>%
  # controls
  setView(lng=center_lon, lat=center_lat, zoom=3)

3.10.1.3 Battle affected Countries

The following plot shows the plot of high Conflict Countries and the Count shows the number of Battles in 2016 and 2017.

HighConflictCountries <- BattleData20162017 %>%
  group_by(COUNTRY) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(COUNTRY = reorder(COUNTRY,Count)) %>%
  head(10)

HighConflictCountries %>%
  ggplot(aes(x = COUNTRY,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = c("red")) +
  geom_text(aes(x = COUNTRY, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Country', 
       y = 'Count', 
       title = 'High Conflict Countries and Count') +
  coord_flip() +
  theme_bw()

We observe that the following countries Somalia, South Sudan, Libya, Nigeria and Sudan are high Battle affected countries and we will investigate how many loans were given for the major Battle affected countries.

3.10.1.4 Loans in Battle affected Countries

We explore the number of loans provided in the Top Twenty High Conflict countries.

HighConflictCountries <- BattleData20162017 %>%
  group_by(COUNTRY) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  head(20)

HighConflictCountryLoans <- inner_join(AfricanLoans, HighConflictCountries,
                                       by=c("country" = "COUNTRY")) %>%
                            group_by(country) %>%
                            summarise(Count = n()) %>%
                            arrange(desc(Count))

HighConflictCountryLoans %>%
  ungroup() %>%
  mutate(country = reorder(country,Count)) %>%
  ggplot(aes(x = country,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = country, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Battle affected Countries', 
       y = 'Loan Count', 
       title = 'Battle affected Countries and Loan Count') +
  coord_flip() +
   theme_bw()

The countries Somalia, South Sudan, Libya and Sudan do not get feature a lot in the Kiva loans.There is a lot of oppurtunity for people in these countries to leverage Kiva.
#### Use of loans

AfricanLoans <- regions %>%
                select(country, world_region) %>%
                unique() %>%
                inner_join(loans) %>%
                filter(str_detect(world_region,"Africa"))

AfricanLoans %>%
  filter(!is.na(use)) %>%
  group_by(use) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(use = reorder(use,Count)) %>%
  head(10) %>%
  ggplot(aes(x = use,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = use, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Use of Loans', 
       y = 'Count', 
       title = 'Use of Loans and Count') +
     coord_flip() +
     theme_bw() 

3.10.2 ASIA

3.10.2.1 Loan Distribution

country_loans = themes_region_combined %>%
  filter(str_detect(world_region,"Asia"))

unique(country_loans$world_region)
## [1] "East Asia and the Pacific" "Europe and Central Asia"  
## [3] "South Asia"
plotBarPlotLoansInGeography(country_loans)

Philippines, Cambodia, Indonesia, Tajikastan and Pakistan have the most loans

The funded loan amount is shown in the form of a histogram. The Y axis and the X axis has been log transformed for better visualization.

AsianLoans <- regions %>%
                select(country,world_region) %>%
                unique() %>%
                inner_join(loans) %>%
                filter(str_detect(world_region,"Asia"))

fundedLoanAmountDistribution(AsianLoans)

#### Summary of Funded Amount
AsianLoans %>%
   select(funded_amount) %>%
   summary()
##  funded_amount  
##  Min.   :    0  
##  1st Qu.:  225  
##  Median :  325  
##  Mean   :  497  
##  3rd Qu.:  575  
##  Max.   :50000

3.10.2.2 Poorest Asian Countries

  • Afghanistan, Yemen, Pakistan, India and Bangladesh are the poorest Asian Countries from the MPI Rural measure.
  • Afghanistan, Bangladesh, Pakistan, Yemen and India are the poorest Asian Countries from the MPI Urban measure.
3.10.2.2.1 Rural
countries_continents = countries_continents %>%
  select(name,Continent)

mpi_national_continent = inner_join(mpi_national, countries_continents,
                                    by=c('Country'= 'name'))

poor_countries_rural <- mpi_national_continent %>%
                        filter(Continent == 'AS') %>%
                        rename(MPIRural = `MPI Rural`) %>%
                        arrange(desc(MPIRural)) %>%
                        head(15) %>%
                        select(Country,MPIRural)

treemap(poor_countries_rural, 
        index="Country", 
        vSize = "MPIRural",  
        title="Poorest Countries Rural Perspective", 
        palette = "RdBu",
        fontsize.title = 14)

3.10.2.2.2 Urban
poor_countries_urban <- mpi_national_continent %>%
                          filter(Continent == 'AS') %>%
                          rename(MPIUrban = `MPI Urban`) %>%
                          arrange(desc(MPIUrban)) %>%
                          head(15) %>%
                          select(Country,MPIUrban)

treemap(poor_countries_urban, 
        index="Country", 
        vSize = "MPIUrban",  
        title="Poorest Countries Urban Perspective", 
        palette = "RdBu",
        fontsize.title = 14)

3.10.2.3 Loans in Poorest Asian countries

poor_countries_loans <- inner_join(AsianLoans, poor_countries_rural,
                                   by=c("country"="Country")) %>%
                        group_by(country) %>%
                        summarise(Count = n()) %>%
                        arrange(desc(Count))

as.tibble(setdiff(poor_countries_rural$Country,poor_countries_loans$country))

The above countries though they feature among the poorest Asian countries from the MPI rural measure do not feature in Kiva loans. There is a good opportunity for them to be included to be in the Kiva family.

3.10.2.4 Use of loans

  AsianLoans %>%
  filter(!is.na(use)) %>%
  
  group_by(use) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(use = reorder(use,Count)) %>%
  head(10) %>%
  ggplot(aes(x = use,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = use, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Use of Loans', 
       y = 'Count', 
       title = 'Use of Loans and Count') +
  coord_flip() +
  theme_bw() 

  • Philippines, Cambodia, Indonesia, Tajikastan and Pakistan have the most loans

3.10.3 AMERICAS

3.10.3.1 Loan Distribution

The funded loan amount is shown in the form of a histogram. The Y axis and the X axis has been log transformed for better visualization.

AmericasLoans <- regions %>%
                  select(country,world_region) %>%
                  unique() %>%
                  inner_join(loans) %>%
                  filter(str_detect(world_region,"America"))

fundedLoanAmountDistribution(AmericasLoans)

AmericasLoans %>%
   select(funded_amount) %>%
   summary()
##  funded_amount     
##  Min.   :     0.0  
##  1st Qu.:   400.0  
##  Median :   600.0  
##  Mean   :   917.5  
##  3rd Qu.:  1000.0  
##  Max.   :100000.0
country_loans = themes_region_combined %>%
  filter(str_detect(world_region,"America"))

unique(country_loans$world_region)
## [1] "Latin America and Caribbean"
plotBarPlotLoansInGeography(country_loans)

3.10.3.2 Poorest South American Countries

  • Peru, Suriname, Colombia, Brazil, Ecuador and Guyana are the poorest countries from the MPI Rural and MPI Urban perspective
3.10.3.2.1 Rural
poor_countries_rural <- mpi_national_continent %>%
  filter(Continent == 'SA') %>%
  rename(MPIRural = `MPI Rural`) %>%
  arrange(desc(MPIRural)) %>%
  head(15) %>%
  select(Country,MPIRural)

treemap(poor_countries_rural, 
        index="Country", 
        vSize = "MPIRural",  
        title="Poorest Countries Rural Perspective", 
        palette = "RdBu",
        fontsize.title = 14)

3.10.3.2.2 Urban
poor_countries_urban <- mpi_national_continent %>%
  filter(Continent == 'SA') %>%
  rename(MPIUrban = `MPI Urban`) %>%
  arrange(desc(MPIUrban)) %>%
  head(15) %>%
  select(Country,MPIUrban)

treemap(poor_countries_urban, 
        index="Country", 
        vSize = "MPIUrban",  
        title="Poorest Countries Urban Perspective", 
        palette = "RdBu",
        fontsize.title = 14)

3.10.3.3 Loans in Poorest South American countries

poor_countries_loans <- inner_join(AmericasLoans, poor_countries_rural,
                                   by =c("country" = "Country")) %>%
                        group_by(country) %>%
                        summarise(Count = n()) %>%
                        arrange(desc(Count))

as.tibble(setdiff(poor_countries_rural$Country,poor_countries_loans$country))
  • The above countries though they feature among the poorest South American countries from the MPI rural measure do not feature in Kiva loans. There is a good opportunity for them to be included to be in the Kiva family.

3.10.3.4 Use of loans

AmericasLoans <- regions %>%
                  select(country,world_region) %>%
                  unique() %>%
                  inner_join(loans) %>%
                  filter(str_detect(world_region,"America"))

AmericasLoans %>%
  filter(!is.na(use)) %>%
  group_by(use) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(use = reorder(use,Count)) %>%
  head(10) %>%
  ggplot(aes(x = use,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = use, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Use of Loans', 
       y = 'Count', 
       title = 'Use of Loans and Count') +
     coord_flip() +
     theme_bw() 

3.11 Kiva Loans in respective Countries

3.11.1 Kenya

3.11.1.1 Loans Distribution

We show the different types of loans in Kenya in the map.

country_loans = loans %>%
    filter(country == "Kenya")

fundedLoanAmountDistribution(country_loans)

summary(country_loans$funded_amount)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   200.0   300.0   425.3   500.0 50000.0
country_loans = themes_region %>% 
  filter(country == "Kenya") %>%
  rename (themeType = `Loan Theme Type`) 

center_lon = median(country_loans$lon,na.rm = TRUE)
center_lat = median(country_loans$lat,na.rm = TRUE)

leaflet(country_loans) %>% addTiles() %>%
  addCircles(lng = ~lon, lat = ~lat,radius = ~(amount/100) ,
             color = ~c("blue"))  %>%
  # controls
  setView(lng=center_lon, lat=center_lat,zoom = 5) 

3.11.1.2 Funding Partners

We plot the most dominant funding partners of Kiva in Kenya

country_loans %>%
  rename(FieldPartnerName =`Field Partner Name`) %>%
  group_by(FieldPartnerName) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(FieldPartnerName = reorder(FieldPartnerName,Count)) %>%
  head(10) %>%
  ggplot(aes(x = FieldPartnerName,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = FieldPartnerName, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Field Partner Name', 
       y = 'Count', 
       title = 'Field Partner Name and Count') +
  coord_flip() +
   theme_bw()

3.11.1.7 Loans Data

datatable(loansData, style="bootstrap", class="table-condensed", options = list(dom = 'tp',scrollX = TRUE))

3.11.2 India

3.11.2.1 Loan Distribution

country_loans = loans %>%
    filter(country == "India")

fundedLoanAmountDistribution(country_loans)

summary(country_loans$funded_amount)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   250.0   325.0   575.5   550.0 12925.0

We show the different types of loans in India in the map.We observe that a major part of India is not utilizing Kiva. An awareness campaign in India about Kiva might be a good idea.

country_loans = themes_region %>% 
  filter(country == "India") %>%
  rename (themeType = `Loan Theme Type`) 

center_lon = median(country_loans$lon,na.rm = TRUE)
center_lat = median(country_loans$lat,na.rm = TRUE)

leaflet(country_loans) %>% addTiles() %>%
  addCircles(lng = ~lon, lat = ~lat,radius = ~(amount/10) ,
             color = ~c("blue"))  %>%
  # controls
  setView(lng=center_lon, lat=center_lat,zoom = 5) 

3.11.2.2 Dominant Funding Partner

We plot the most dominant funding partners of Kiva in India

country_loans %>%
  rename(FieldPartnerName =`Field Partner Name`) %>%
  group_by(FieldPartnerName) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(FieldPartnerName = reorder(FieldPartnerName,Count)) %>%
  head(10) %>%
  ggplot(aes(x = FieldPartnerName,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = FieldPartnerName, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Field Partner Name', 
       y = 'Count', 
       title = 'Field Partner Name and Count') +
  coord_flip() +
   theme_bw()

3.11.2.6 Loans Trend

We show the trend of loans from the years 2014 onwards. The trend shows that the number of loans keep on increasing with time.

loansData = loans %>%
              filter(country == "India") %>%
              filter(!is.na(funded_time)) %>%
              mutate(year = year(ymd_hms(funded_time))) %>%
              mutate(month = month(ymd_hms(funded_time))) %>%
              filter(!is.na(year)) %>%
              filter(!is.na(month)) %>%
              group_by(year,month) %>%
              summarise(Count = n()) %>%
              mutate(YearMonth = make_date(year=year,month=month) ) 
  
loansData %>%
  ggplot(aes(x=YearMonth,y=Count,group = 1)) +
  geom_line(size=1, color="red")+
  geom_point(size=3, color="red") +
  labs(x = 'Time', y = 'Count',title = 'Trend of loans') +
  theme_bw() 

3.11.2.7 Loans data

datatable(loansData, style="bootstrap", class="table-condensed", options = list(dom = 'tp',scrollX = TRUE))

3.11.3 El Salvador

3.11.3.1 Loan Distribution

We show the different types of loans in El Salvador in the map.

country_loans = loans %>%
    filter(country == "El Salvador")

fundedLoanAmountDistribution(country_loans)

summary(country_loans$funded_amount)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   325.0   500.0   585.8   800.0  2900.0
country_loans = themes_region %>% 
  filter(country == "El Salvador") %>%
  rename (themeType = `Loan Theme Type`) 

center_lon = median(country_loans$lon,na.rm = TRUE)
center_lat = median(country_loans$lat,na.rm = TRUE)

leaflet(country_loans) %>% addTiles() %>%
  addCircles(lng = ~lon, lat = ~lat,radius = ~(amount/100) ,
             color = ~c("blue"))  %>%
  # controls
  setView(lng=center_lon, lat=center_lat,zoom = 7) 

3.11.3.2 Dominant Field Partner

We plot the most dominant Field partners of Kiva in El Salvador

country_loans %>%
  rename(FieldPartnerName =`Field Partner Name`) %>%
  group_by(FieldPartnerName) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(FieldPartnerName = reorder(FieldPartnerName,Count)) %>%
  head(10) %>%
  ggplot(aes(x = FieldPartnerName,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = FieldPartnerName, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Field Partner Name', 
       y = 'Count', 
       title = 'Field Partner Name and Count') +
  coord_flip() +
  theme_bw()

3.11.3.7 Loans Data

datatable(loansData, style="bootstrap", class="table-condensed", options = list(dom = 'tp',scrollX = TRUE))

3.12 Multidimensional Poverty Measures

We use the Kaggle Dataset Multidimensional Poverty Measures. The following explains this term and this has been taken from the dataset documentation

  • Most countries of the world define poverty as a lack of money. Yet poor people themselves consider their experience of poverty much more broadly. A person who is poor can suffer from multiple disadvantages at the same time - for example they may have poor health or malnutrition, a lack of clean water or electricity, poor quality of work or little schooling. Focusing on one factor alone, such as income, is not enough to capture the true reality of poverty.

  • Multidimensional poverty measures can be used to create a more comprehensive picture. They reveal who is poor and how they are poor - the range of different disadvantages they experience. As well as providing a headline measure of poverty, multidimensional measures can be broken down to reveal the poverty level in different areas of a country, and among different sub-groups of people.

3.12.1 MPI Map

Higher the MPI, poorer is the country. The map clearly shows the poorer countries are centered around Africa.The Red Dots indicate that the country is poorer.

pal <- colorNumeric(
  palette = colorRampPalette(c('green', 'red'))(length(regions$MPI)), 
  domain = regions$MPI)

regions_no_NA = regions %>%
  filter(!is.na(lon)) %>%
  filter(!is.na(lat))

center_lon = median(regions$lon,na.rm = TRUE)
center_lat = median(regions$lat,na.rm = TRUE)

leaflet(data = regions_no_NA) %>%
  addTiles() %>%
  addCircleMarkers(
    lng =  ~ lon,
    lat =  ~ lat,
    radius = ~ MPI*10,
    popup =  ~ country,
    color =  ~ pal(MPI)
  ) %>%
  # controls
  setView(lng=center_lon, lat=center_lat,zoom = 3) %>%
  addLegend("topleft", pal = pal, values = ~MPI,
            title = "MPI Map",
            opacity = 1)

3.12.2 MPI Rural

We use the metric MPI Rural from the Kaggle Dataset Multidimensional Poverty Measures. This metric provides the Average distance below the poverty line of those listed as poor in rural areas. This will provide Kiva an understanding on the countries where loans would be most needed.

mpi_national %>%
  rename(mpi_rural = `MPI Rural`) %>%
  arrange(desc(mpi_rural)) %>%
  mutate(Country = reorder(Country,mpi_rural)) %>%
  head(10) %>%
  
  ggplot(aes(x = Country,y = mpi_rural)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = Country, y = 1, label = paste0("(",mpi_rural,")",sep="")),
            hjust=1, vjust=1, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Country', 
       y = 'mpi_rural', 
       title = 'Country and Count') +
  coord_flip() +
  theme_bw()

  • Niger,Somalia, Ethopia,Burkina Faso and Chad are the poorest from the MPI Rural measure

3.12.3 MPI Urban

We use the metric MPI Urban from the Kaggle Dataset Multidimensional Poverty Measures. This metric provides the Average distance below the poverty line of those listed as poor in urban areas. This will provide Kiva an understanding on the countries where loans would be most needed.

mpi_national %>%
  rename(mpi_urban = `MPI Urban`) %>%
  arrange(desc(mpi_urban)) %>%
  mutate(Country = reorder(Country,mpi_urban)) %>%
  head(10) %>%
  
  ggplot(aes(x = Country,y = mpi_urban)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = Country, y = 1, label = paste0("(",mpi_urban,")",sep="")),
            hjust=1, vjust=1, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Country', 
       y = 'mpi_urban', 
       title = 'Country and Count') +
  coord_flip() +
  theme_bw()

  • South Sudan,Chad, Somalia,Liberia and Central African Republic are the poorest from the MPI Urban measure

3.13 MPI Countries and Kiva Loans

We show the High MPI Rural and Low MPI Rural countries with the most loans.Mali, Sierra Leone,Liberia, Mozambique,Burkina Faso are the High MPI Rural countries with the most loans.Armenia,Kyrgyzstan,Albania,Ukraine and Thailand are the Low MPI Rural countries with the most loans

3.13.1 High MPI Countries with Kiva Loans

mpi_national_rural_top_10 = mpi_national %>%
  rename(mpi_rural = `MPI Rural`) %>%
  arrange(desc(mpi_rural)) %>%
  mutate(Country = reorder(Country,mpi_rural)) %>%
  head(15)

mpi_national_rural_top_10_loans <- loans %>%
                                    inner_join(mpi_national_rural_top_10,
                                               by=c("country" = "Country"))

getTopLoansByCountry <- function(dataset,fillColorName) {
  dataset %>%
    group_by(country) %>%
    summarise(Count = n()) %>%
    arrange(desc(Count)) %>%
    ungroup() %>%
    mutate(country = reorder(country,Count)) %>%
    head(10) %>%
    ggplot(aes(x = country,y = Count)) +
    geom_bar(stat='identity',colour="white", fill = fillColorName) +
    geom_text(aes(x = country, y = 1, label = paste0("(",Count,")",sep="")),
              hjust=0, vjust=.5, size = 4, colour = 'black',
              fontface = 'bold') +
    labs(x = 'Country', 
         y = 'Count', 
         title = 'Country and Count') +
    coord_flip() +
     theme_bw()
}

getTopLoansByCountry(mpi_national_rural_top_10_loans,fillColor2)

  • As per above, High MPI Rural countries with the most loans.
  • Mali, Sierra Leone, Liberia, Mozambique, Burkina Faso are the High MPI Rural countries with the most loans.

3.13.2 Low MPI Countries with Kiva Loans

mpi_national_rural_bottom_10 = mpi_national %>%
  rename(mpi_rural = `MPI Rural`) %>%
  arrange(mpi_rural) %>%
  mutate(Country = reorder(Country,mpi_rural)) %>%
  head(15) 

mpi_national_rural_bottom_10 = loans %>%
                                inner_join(mpi_national_rural_bottom_10,
                                           by =c("country" = "Country"))

getTopLoansByCountry(mpi_national_rural_bottom_10,fillColor)

3.14 Use of loans and MPI

The loans for High MPI countries are used for buying condiments, to buy sheep for resale, to buy contruction materials,to buy building materials, to buy fertilizer for groundnuts, okra and peanuts. The loans for Low MPI countries are used to buy cows, to pay for her higher education,to buy some livestock to increase her herd and to buy sheep

3.14.1 Use of loans in High MPI Rural Countries

We show the use of loans in the High MPI Rural countries.

  mpi_national_rural_top_10_loans %>%
  filter(!is.na(use)) %>%
  group_by(use) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(use = reorder(use,Count)) %>%
  head(10) %>%
  ggplot(aes(x = use,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor) +
  geom_text(aes(x = use, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Use of Loans', 
       y = 'Count', 
       title = 'Use of Loans and Count in High MPI countries') +
     coord_flip() +
     theme_bw() 

3.14.2 Use of loans in Low MPI Rural Countries

We show the use of loans in the Low MPI Rural countries.

mpi_national_rural_bottom_10 %>%
  filter(!is.na(use)) %>%
  group_by(use) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(use = reorder(use,Count)) %>%
  head(10) %>%
  ggplot(aes(x = use,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor) +
  geom_text(aes(x = use, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Use of Loans', 
       y = 'Count', 
       title = 'Use of Loans and Count in Low MPI countries') +
     coord_flip() +
     theme_bw() 

3.16 Distribution of the Funded Loan amount

  • High MPI Countries have Median Funded Loan Amount is $600
  • Low MPI Countries have Median Funded Loan Amount is $1100

3.16.1 Distribution of Funded Loan amount in High MPI countries

The funded loan amount is shown in the form of a histogram. The Y axis and the X axis has been log transformed for better visualization.

fundedLoanAmountDistribution(mpi_national_rural_top_10_loans)

Summary of Funded Amount in High MPI countries

mpi_national_rural_top_10_loans %>%
   select(funded_amount) %>%
   summary()
##  funded_amount    
##  Min.   :    0.0  
##  1st Qu.:  275.0  
##  Median :  600.0  
##  Mean   :  930.3  
##  3rd Qu.: 1175.0  
##  Max.   :50000.0

3.16.2 Distribution of Funded Loan amount in Low MPI countries

The funded loan amount is shown in the form of a histogram. The Y axis and the X axis has been log transformed for better visualization.

fundedLoanAmountDistribution(mpi_national_rural_bottom_10)

Summary of Funded Amount in Low MPI countries

mpi_national_rural_bottom_10 %>%
   select(funded_amount) %>%
   summary()
##  funded_amount  
##  Min.   :    0  
##  1st Qu.:  725  
##  Median : 1100  
##  Mean   : 1281  
##  3rd Qu.: 1675  
##  Max.   :50000

3.17 Poorest Regions

We explore the Poorest Regions in the world. We use the Metric Intensity of Deprivation Regional.

Lac in Chad is the poorest region followed by Affar in Ethopia, Est in Burkina Faso, Ouaddað and Wadi Fira in Chad.

poorest_regions = mpi_subnational %>%
  rename(intensity_deprivation_regional = `Intensity of deprivation Regional`) %>%
  rename(sub_national_region = `Sub-national region`) %>%
  arrange(desc(intensity_deprivation_regional)) 
  
datatable(head(poorest_regions,10), style="bootstrap", class="table-condensed", options = list(dom = 'tp',scrollX = TRUE))

3.18 Human Development Index

Sierra Leone , Eritrea , South Sudan , Mozambique, Guinea, Burundi, Burkina Faso, Chad, Niger and Central African Republic are countries with the lowest Human Development Index
Kiva should direct the loans to these countries.

GEconV4_lat_lon = GEconV4 %>%
  group_by(COUNTRY) %>%
  mutate(lat = median(LAT,na.rm = TRUE)) %>%
  mutate(lon =  median(LONGITUDE,na.rm = TRUE)) %>%
  select(COUNTRY,lat,lon) %>%
  unique()

country_stats_lat_lon = inner_join(country_stats, GEconV4_lat_lon, 
                                   by=c('country_name'='COUNTRY'))

country_stats %>%
  arrange(hdi) %>%
  mutate(Country = reorder(country_name,hdi)) %>%
  head(10) %>%
  ggplot(aes(x = Country,y = hdi)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = Country, y = 1, label = paste0("(",round(hdi,3),")",sep="")),
            hjust=0, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Country', 
       y = 'hdi', 
       title = 'Countries with Lowest Human Development Index') +
  coord_flip() +
  theme_bw()

3.18.1 Human Development Index Map

pal <- colorNumeric(
  palette = colorRampPalette(c('red', 'green'))(length(country_stats$hdi)), 
  domain = country_stats$hdi)

country_stats_lat_lon_no_NA = country_stats_lat_lon %>%
  filter(!is.na(hdi)) %>%
  filter(!is.na(lon)) %>%
  filter(!is.na(lat))

center_lon = median(country_stats_lat_lon_no_NA$lon,na.rm = TRUE)
center_lat = median(country_stats_lat_lon_no_NA$lat,na.rm = TRUE)

leaflet(data = country_stats_lat_lon_no_NA) %>%
  addTiles() %>%
  addCircleMarkers(
                    lng =  ~ lon,
                    lat =  ~ lat,
                    radius = ~ hdi*10,
                    popup =  ~ country_name,
                    color =  ~ pal(hdi)
                  ) %>%
  setView(lng=center_lon, lat=center_lat,zoom = 2) %>% # controls
  addLegend("topleft", pal = pal, values = ~hdi,
            title = "Human Development Index Map",
            opacity = 1)
  • The more red the circles, the lesser is the Human Development Index. African continent has very low HDI and Kiva should focus on loans on this belt.

3.18.2 Population below Poverty Line

country_stats %>%
  arrange(desc(population_below_poverty_line)) %>%
  mutate(Country = reorder(country_name,population_below_poverty_line)) %>%
  head(10) %>%
  ggplot(aes(x = Country,y = population_below_poverty_line)) +
  geom_bar(stat='identity',colour="white", fill = fillColorLightCoral) +
  geom_text(aes(x = Country, y = 1, label = paste0("(",population_below_poverty_line,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Country', 
       y = 'Population Below Poverty Line', 
       title = 'Countries with Highest Population Below Poverty Line') +
  coord_flip() +
  theme_bw()

  • Syria,Zimbabwe,Madagascar,Sierra Leone,Suriname,Nigeria,Guinea-Bissau,Burundi,Swaziland and Democratic Republic of Congo are the countries which have the highest population below poverty line.

3.18.3 Population under Poverty Line Map

pal <- colorNumeric(
  palette = colorRampPalette(c('green', 'red'))(length(country_stats$population_below_poverty_line)), 
  domain = country_stats$population_below_poverty_line)

country_stats_lat_lon_no_NA = country_stats_lat_lon %>%
  filter(!is.na(population_below_poverty_line)) %>%
  filter(!is.na(lon)) %>%
  filter(!is.na(lat))

center_lon = median(country_stats_lat_lon_no_NA$lon,na.rm = TRUE)
center_lat = median(country_stats_lat_lon_no_NA$lat,na.rm = TRUE)

leaflet(data = country_stats_lat_lon_no_NA) %>%
  addTiles() %>%
  addCircleMarkers(
    lng =  ~ lon,
    lat =  ~ lat,
    radius = ~ population_below_poverty_line/10,
    popup =  ~ country_name,
    color =  ~ pal(population_below_poverty_line)
  ) %>%
  # controls
  setView(lng=center_lon, lat=center_lat,zoom = 2) %>%
  
  addLegend("topleft", pal = pal, values = ~population_below_poverty_line,
          title = "Population under Povery Line Map",
           opacity = 1)

3.19 Top Funding Partners

The Top Ten Funding Partners accounting for most of the loans are provided below

themes_region %>%
  rename(FieldPartnerName =`Field Partner Name`) %>%
  group_by(FieldPartnerName) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(FieldPartnerName = reorder(FieldPartnerName,Count)) %>%
  head(10) %>%
  ggplot(aes(x = FieldPartnerName,y = Count)) +
  geom_bar(stat='identity',colour="white", fill = fillColor) +
  geom_text(aes(x = FieldPartnerName, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Field Partner Name', 
       y = 'Count', 
       title = 'Field Partner Name and Count') +
  coord_flip() +
   theme_bw()

3.20 Naive Poverty Metric

The Naive Poverty Metric is calculated on the Population below Poverty Line, Human Development Index, MPI Urban and MPI Rural

country_stats = left_join(country_stats,mpi_national,
                          by =c('country_name'= 'Country'))

country_stats_AG = country_stats %>% 
                    select(population_below_poverty_line,hdi,country_name,`MPI Urban`,`MPI Rural`) %>%
                    mutate(AGMetric = (country_stats$population_below_poverty_line)/100 + 
                                      (1-country_stats$hdi) +
                                      `MPI Urban` + `MPI Rural`)

3.20.1 Distribution of the Naive Poverty Metric

country_stats_AG %>%
    ggplot(aes(x = AGMetric)) +
    geom_histogram(fill = fillColorLightCoral,bins=100) +
    labs(x = 'Naive Poverty Metric' ,y = 'Count', title = paste("Distribution of", "Naive Poverty Metric")) +   
    theme_bw()

3.20.2 Poorest Countries based on Naive Poverty Metric

country_stats_AG %>%
  arrange(desc(AGMetric)) %>%
  mutate(Country = reorder(country_name,AGMetric)) %>%
  head(10) %>%
  ggplot(aes(x = Country,y = AGMetric)) +
  geom_bar(stat='identity',colour="white", fill = fillColor2) +
  geom_text(aes(x = Country, y = 1, label = paste0("(",round(AGMetric,2),")",sep="")),
            hjust=1, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Country', 
       y = 'AGMetric', 
       title = 'Countries with Highest Naive Poverty Metric') +
  coord_flip() +
  theme_bw()

  • Nigeria, Guinea, Burkina Faso, Liberia ,Burundi, Guinea-Bissau, Chad, Niger, Sierra Leone and South Sudan are the poorest according to the Naive Poverty Metric

4 Part 4 - Inference

4.1 Statistics

Exploratory data analysis suggests below statistics.

Statistic Variable Value
Population Mean Loan Amount 842.3971067
Population SD Loan Amount 1198.6600729
Sample Statistics Mean Loan Amount 842.3971067
Sample Statistics SD Loan Amount 1198.6600729

4.2 Confidence interval of Loan Amount

Point estimate from the sample with the confidence interval is shown below

inference(y=loans$loan_amount, est="mean", null=0, type="ci", conflevel=0.95, method="theoretical")

For this test lets validate the total sample size required.

s = sd(loans$loan_amount)
n = ((pnorm(0.025)*s)/0.03)^2 

If the margin of error to be 3%, we need to get the samples of around 4.15186410^{8}.

  • Some countries have receved higher loan amounts than another.
  • Countries with poorest MPI are not yet fully funded.

4.3 Below are the conditions for least squared line

We are going to validate the conditions for least squared line.

4.3.1 1. Linearity

From the below chart, it shows that there is a very slight upward relationship between Term in Months and Count. The linear model is very strong due to large number of variability.

loans_TermCount <- as.data.frame(data.table::rbindlist(list(AfricanLoans,AsianLoans,AmericasLoans))) %>%
                    filter(!is.na(term_in_months)) %>%
                    mutate(term_in_months = (as.numeric(term_in_months))) %>%
                    group_by(term_in_months) %>%
                    summarise(Count = n()) %>%
                    arrange(desc(Count)) %>%
                    #ungroup() %>%
                    mutate(term_in_months = reorder(term_in_months,Count))
loans_TermCount %>%
  ggplot( aes(x=term_in_months, y=Count)) +
  geom_point(size=1,alpha=0.8) +
  geom_smooth(method = "lm") + 
  ggtitle("Term in Months vs Count")

#cor(loans_TermCount$term_in_months, loans_TermCount$Count)

4.3.2 2. Nearly normal residuals

loans_lm <- lm(loan_amount ~ term_in_months + lender_count, loans)
df_residuals <- broom::augment(loans_lm)

Let’s check the residuals normality with histogram and qqplot.

#Historgram plot of residuals
ggplot(df_residuals,aes(df_residuals$.resid)) + geom_histogram() + ggtitle("Residual Histogram")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#QQ norm plot of the residuals
qqnorm(loans_lm$residuals)
qqline(loans_lm$residuals) 

The plots show that residuals are slightly left skewed. But the residuals are normal

4.3.3 3. Constant Variability

df_residuals <- filter(df_residuals,.fitted >-17)
ggplot(df_residuals,aes(x=.fitted,y=.resid)) + geom_point(size=1,alpha=0.8) + geom_smooth(method = "lm")  + ggtitle("Loans vs Residuals") 

ggplot(df_residuals,aes(x=.fitted,y=abs(.resid))) + geom_point(size=1,alpha=0.8) + ggtitle("Loans vs Residuals")

Above plot shows that there is a constant variability in the chart.

4.4 Different purpose of loan request

** Is loan funded amount are equal for different purpose of loan request?**

Let’s validate if the purpose of loan varies or not.

#Hypothesis test between purpose
statsr::inference(y=loans$loan_amount, x=loans$use, est="mean", null=0, alternative="greater", type="ht", method="theoretical")

Above output shows that loan amount varies for each purpose of loan.

Based on above analysis for all Kiva loans based on regions we can infer the following

Section : African Loans
  • Kenya,Lesotho,Uganda,Malawi and Ghana are the countries which have got the most loans.
  • We observe from the sections Multidimensional Poverty Measures and Distribution of loans in Africa that we do not see loans in the poorest areas. This might be an opportunity for Kiva to help these very underpriviliged countries.
  • We extend our analysis to the African Conflicts data found in Kaggle and map the most battle prone areas in 2016 and 2017. The intention is to highlight that these Battle prone areas may be in need of funds for very basic neccessities such as water and food.We find that the most Battle prone countries Somalia, South Sudan, Libya and Sudan do not get feature a lot in the Kiva loans.There is a lot of oppurtunity for people in these countries to leverage Kiva.
  • We observe that the African median funded amount( $375 ) is less than the World median funded amount ( $450). Kiva can channelise more funds to African continent since we can help with smaller amounts.
Section : American Loans
  • Peru,Suriname,Colombia,Brazil,Ecuador and Guyana are the poorest countries from the MPI Rural and MPI Urban perspective.
Section : Asian Loans
  • Philippines, Cambodia, Indonesia, Tajikastan and Pakistan have the most loans
  • Afghanistan,Yemen,Pakistan,India and Bangladesh are the poorest Asian Countries from the MPI Rural measure.Afghanistan,Bangladesh,Pakistan,Yemen and India are the poorest Asian Countries from the MPI Urban measure.
  • We see that the countries Yemen, Bangladesh, Myanmar and Iraq though they are the poorest do not feature in Kiva loans. There is a good opportunity for them to be included to be in the Kiva family.

5 Part 5 - Conclusion

As a whole, we can conclude following in global perspective the effect of Kiva Loans
Section : Global view of Kiva loans
  • Most Popular themes are General, Underserved followed by Agriculture,Rural Inclusion , Water and Higher Education
  • Most Popular sectors are Agriculture,Food, Retail, Services and Personal Use
  • Most Popular activities for usage of loans are Farming, General Store,Personal Housing Expenses,Food Production/ Sales and Agriculture
  • Most popular uses of loans are To Buy Water Filter, To construct a Sanitary toilet, To buy ingredients for food production business, To buy groceries to sell, To buy food for pigs
  • The median funded amount is $450 and the mean funded amount is $786
  • 14 months is the most common Term for the loans followed by 8, 11, 7 and 13 months
  • Women get more loans than men
  • The countries which have received the most loans are Philipines, Kenya, El Salvador, Cambodia and Pakistan

Section : Multidimensional Poverty Measures
  • Niger, Somalia, Ethopia, Burkina Faso and Chad are the poorest from the MPI Rural measure
  • South Sudan,Chad, Somalia,Liberia and Central African Republic are the poorest from the MPI Urban measure
  • Lac in Chad is the poorest region followed by Affar in Ethopia, Est in Burkina Faso, Ouaddað and Wadi Fira in Chad.
  • Mali, Sierra Leone,Liberia, Mozambique,Burkina Faso are the High MPI Rural countries with the most loans.Armenia,Kyrgyzstan,Albania,Ukraine and Thailand are the Low MPI Rural countries with the most loans
  • The loans for High MPI countries are used for buying condiments, to buy sheep for resale, to buy contruction materials,to buy building materials, to buy fertilizer for groundnuts, okra and peanuts. The loans for Low MPI countries are used to buy cows, to pay for her higher education,to buy some livestock to increase her herd and to buy sheep
  • Food,Retail,Agriculture,Clothing and Services are the most popular sector for loans in High MPI Rural countries. Agriculture,Education,Health,Housing and Clothing are the most popular sector for loans in Low MPI Rural countries
  • High MPI Countries have Median Funded Loan Amount $600 and Low MPI Countries have Median Funded Loan Amount $1100

Section : Human Development Index
  • Sierra Leone , Eritrea , South Sudan , Mozambique, Guinea, Burundi, Burkina Faso, Chad , Niger and Central African Republic are countries with the lowest Human Development Index
Section : Population below Poverty Line
Syria,Zimbabwe,Madagascar,Sierra Leone,Suriname,Nigeria,Guinea-Bissau,Burundi,Swaziland and Democratic Republic of Congo are the countries which have the highest population below poverty line.

Section : Naive Poverty Metric
  • The Naive Poverty Metric is calculated on the Population below Poverty Line , Human Development Index, MPI Urban and MPI Rural
  • Nigeria, Guinea, Burkina Faso, Liberia ,Burundi, Guinea-Bissau, Chad, Niger, Sierra Leone and South Sudan are the poorest according to the Naive Poverty Metric

This project helped in giving me the exposure to treemap and leaflet packages and implement them in the wonderful world of Kiva.

Debabrata Kabiraj

May 07, 2019