This report documents the key demographics and drivers of financial inclusion in a community.
Financial Inclusion: Individuals who are over 18 years in age that use financial products/services whether formal or informal.
Financial Exclusion: individuals 18 years of age and over that do not have/use any financial products and/or services; formal and/or informal
Informal: Individuals aged 18 and above that use financial products/services which are not regulated
Formal other: Individuals aged 18 and above , who use/have financial products/services provided by* other regulated non-deposit money bank financial institutions e.g. microfinance
Banked: Individuals aged 18 and above who are using commercial banks to obtain products or services
## [1] "Marital.Status" "Gender" "Source" "Financial.Access"
## [5] "Age.Group" "Sector" "Education.Level"
## Marital.Status Gender
## Married (Monogamy) :14519 Male :12456
## Never married : 5805 Female:12075
## Married (Polygamy) : 2324
## Widowed : 878
## Separated : 510
## Co-Habiting/living together: 211
## (Other) : 284
## Source
## Own business - provide a service (e.g. hairdresser, tailor, mechanic):3745
## Own business/trader - non-farming :3739
## Subsistence/small scale farming :3599
## Household member pays my expenses :2516
## Own business/trader - farming products :2311
## Commercial/large scale farming :1727
## (Other) :6894
## Financial.Access Age.Group Sector
## Banked :10390 15-17: 0 URBAN: 6666
## Other_F : 1397 18-25:6560 RURAL:17865
## Informal: 3670 26-35:8709
## Excluded: 9074 36-45:5936
## 46-55:3326
## 56+ : 0
##
## Education.Level
## Lower levels of education :10863
## Achieved Secondary and above:13668
##
##
##
##
##
attach(summary_data)
gender_data<-summary_data %>% count(Gender, Financial.Access, sort=TRUE)
ftable(table(Gender,Financial.Access))
## Financial.Access Banked Other_F Informal Excluded
## Gender
## Male 6059 708 1642 4047
## Female 4331 689 2028 5027
attach(gender_data)
gd<-gender_data %>% # Calculate percentage by group
group_by(Financial.Access) %>%
mutate(perc = paste(as.character(round(n*100 / sum(n),2)),"%")) %>%
as.data.frame()
g1<-ggplot(gd, aes(x=Gender,y=n, fill=Financial.Access)) +geom_bar(stat="identity")+scale_fill_brewer()
g1+geom_text(aes(label = perc, size="6"), position = position_stack(vjust = 0.5)) + ggtitle ("Financial Access by Gender")
sector_data<-summary_data %>% count(Sector, Gender, Financial.Access, sort=TRUE)
attach(sector_data)
#ftable(table(Gender, Sector, Financial.Access))
attach(sector_data)
sd<-sector_data %>% # Calculate percentage by group
group_by(Financial.Access) %>%
mutate(perc = paste(as.character(round(n*100 / sum(n),2)),"%")) %>%
as.data.frame()
s1<-ggplot(sd, aes(x=Sector,y=n, fill=Financial.Access))+geom_bar(stat="identity")+facet_wrap(facets=vars(Gender))+ scale_fill_brewer()
s1+geom_text(aes(label = perc, size="6"), position = position_stack(vjust = 0.5)) + ggtitle ("Financial Access by Sector")
maritalStatusData<-summary_data %>% count(Marital.Status, Financial.Access, sort=TRUE)
msd<-maritalStatusData %>% # Calculate percentage by group
group_by(Financial.Access) %>%
mutate(perc = paste(as.character(round(n*100 / sum(n),2)),"%")) %>%
as.data.frame()
m1<-ggplot(msd, aes(x=Marital.Status,y=n, fill=Financial.Access))+ geom_bar(stat="identity")+theme( axis.text.x=element_text(size=8)) + scale_fill_brewer()
m1 + ggtitle ("Financial Access by Marital Status")
attach(summary_data)
EducationData<-summary_data %>% count(Education.Level, Financial.Access, sort=TRUE)
ftable(table(Education.Level,Financial.Access))
## Financial.Access Banked Other_F Informal Excluded
## Education.Level
## Lower levels of education 1631 736 2043 6453
## Achieved Secondary and above 8759 661 1627 2621
ed<-EducationData %>% # Calculate percentage by group
group_by(Financial.Access) %>%
mutate(perc = paste(as.character(round(n*100 / sum(n),2)),"%")) %>%
as.data.frame()
e1<-ggplot(ed, aes(x=Education.Level,y=n, fill=Financial.Access))+ geom_bar(stat="identity")+theme(axis.text.x=element_text(size=8))+scale_fill_brewer()
e1+geom_text(aes(label = perc, size="6"), position = position_stack(vjust = 0.5)) + ggtitle ("Financial Access by Education Level")
smallholderfarmers<-subset(summary_data, Source %in% "Subsistence/small scale farming")
attach(smallholderfarmers)
farmerData<-smallholderfarmers %>% count(Gender, Financial.Access, sort=TRUE)
ftable(table(Gender,Financial.Access))
## Financial.Access Banked Other_F Informal Excluded
## Gender
## Male 628 157 434 1150
## Female 216 89 344 581
fd<-farmerData %>% # Calculate percentage by group
group_by(Financial.Access) %>%
mutate(perc = paste(as.character(round(n*100 / sum(n),2)),"%")) %>%
as.data.frame()
f1<-ggplot(fd, aes(x=Gender,y=n, fill=Financial.Access))+ geom_bar(stat="identity")+scale_fill_brewer()
f1+geom_text(aes(label = perc, size="8"), position = position_stack(vjust = 0.5)) + ggtitle ("Financial Access of Small Holder Farmers by Gender")
attach(summary_data)
FI<-Financial.Access
#collapse financial access 4 level variable("Banked", "Formal-other", "Informal", "Excluded") into binary variable with options "included" vs "excluded"
for(i in 1:length(FI)){if(Financial.Access[i]=="Informal" || Financial.Access[i]=="Other_F") FI[i]<-"Banked"}
FIdata<-droplevels(FI)
levels(FIdata)<-c("Included","Excluded")
#add new column with binary FI variable
summary_data$FI<-FIdata
#fit a logistic regression with binary y
glm0<-glm(FI~Gender+Sector+Marital.Status+Education.Level, family="binomial")
summary(glm0)
##
## Call:
## glm(formula = FI ~ Gender + Sector + Marital.Status + Education.Level,
## family = "binomial")
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -0.46910 0.04272 -10.982
## GenderFemale 0.35937 0.03016 11.917
## SectorRURAL 0.74826 0.03671 20.381
## Marital.StatusMarried (Polygamy) 0.22794 0.04964 4.592
## Marital.StatusCo-Habiting/living together -0.33156 0.16897 -1.962
## Marital.StatusDivorced 0.02070 0.16385 0.126
## Marital.StatusSeparated -0.55810 0.11208 -4.979
## Marital.StatusWidowed -0.32253 0.07828 -4.120
## Marital.StatusNever married 0.20426 0.03809 5.363
## Marital.StatusRefused to answer 0.28928 0.24275 1.192
## Education.LevelAchieved Secondary and above -1.71372 0.03136 -54.638
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## GenderFemale < 2e-16 ***
## SectorRURAL < 2e-16 ***
## Marital.StatusMarried (Polygamy) 4.39e-06 ***
## Marital.StatusCo-Habiting/living together 0.0497 *
## Marital.StatusDivorced 0.8995
## Marital.StatusSeparated 6.38e-07 ***
## Marital.StatusWidowed 3.79e-05 ***
## Marital.StatusNever married 8.20e-08 ***
## Marital.StatusRefused to answer 0.2334
## Education.LevelAchieved Secondary and above < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 32327 on 24530 degrees of freedom
## Residual deviance: 27410 on 24520 degrees of freedom
## AIC: 27432
##
## Number of Fisher Scoring iterations: 4
exp(glm0$coefficients)
## (Intercept)
## 0.6255677
## GenderFemale
## 1.4324302
## SectorRURAL
## 2.1133146
## Marital.StatusMarried (Polygamy)
## 1.2560060
## Marital.StatusCo-Habiting/living together
## 0.7178046
## Marital.StatusDivorced
## 1.0209130
## Marital.StatusSeparated
## 0.5722936
## Marital.StatusWidowed
## 0.7243152
## Marital.StatusNever married
## 1.2266177
## Marital.StatusRefused to answer
## 1.3354692
## Education.LevelAchieved Secondary and above
## 0.1801936
#plot residuals to check for goodness of fit
From the Logistic Regression at significance level P<0.01, The key drivers of Financial Inclusion are :
Gender with odds of being included increasing highest in Males =1.57 (57%) times versus females at 1.43times (43%)
Sector with odds of being included increasing highest in Urban sector at 2.89 times (189%) versus rural at 2.11 times (111%)
Education Level with odds of being included decreasing lowest for those who Achieved Higher Levels of Education at 0.18 times versus Lower Levels at 0.82 times and
Marital Status except those “Divorced” , “Cohabiting /Living together” and “Refused to answer” with odds of being included increasing higher for those in Married(Monogamous) at 1.75 times versus 1.25 times in Polygamous…etc
#compute count variable y to fit possible poisson regression
data<-summary_data %>% count(Marital.Status, Gender, Education.Level, Sector, FI, sort=TRUE)
attach(data)
sample<- subset(data,FI=="Included")
colnames(sample)[6]<-"count.included"
head(sample)
## Marital.Status Gender Education.Level Sector FI
## 1 Married (Monogamy) Male Achieved Secondary and above RURAL Included
## 4 Married (Monogamy) Female Achieved Secondary and above RURAL Included
## 5 Never married Male Achieved Secondary and above RURAL Included
## 6 Married (Monogamy) Female Achieved Secondary and above URBAN Included
## 7 Married (Monogamy) Male Lower levels of education RURAL Included
## 8 Married (Monogamy) Male Achieved Secondary and above URBAN Included
## count.included
## 1 2322
## 4 1516
## 5 1366
## 6 1297
## 7 1145
## 8 1123
attach(sample)
#EXPLORATORY DATA ANALYSIS 1: boxplot of our financial inclusion count variable
boxplot(count.included)
#EXPLORATORY DATA ANALYSIS 2:check the mean and variance
mean(count.included)
## [1] 241.5156
var(count.included)
## [1] 213725.9
#The variance is much larger than the mean which suggests we will have an overdispersion problem
#i.e the estimates in our poisson regression will be correct but the standard errors will be wrong
#EXPLORATORY DATA ANALYSIS 3:plot histogram and probability mass function of our count data
hist(count.included)
pmf(count.included)
#FITTING a negative binomial model to remedy overdispersion
library(MASS)
glm2<-glm.nb(count.included ~ Gender + Education.Level + Sector + Marital.Status)
summary(glm2)
##
## Call:
## glm.nb(formula = count.included ~ Gender + Education.Level +
## Sector + Marital.Status, init.theta = 3.525431838, link = log)
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) 5.9142 0.2261 26.153
## GenderFemale 0.2828 0.1425 1.985
## Education.LevelAchieved Secondary and above 0.7948 0.1428 5.568
## SectorRURAL 0.8402 0.1428 5.885
## Marital.StatusMarried (Polygamy) -2.0349 0.2688 -7.572
## Marital.StatusCo-Habiting/living together -4.0611 0.2819 -14.405
## Marital.StatusDivorced -4.2604 0.2850 -14.948
## Marital.StatusSeparated -3.1183 0.2729 -11.428
## Marital.StatusWidowed -2.8208 0.2713 -10.398
## Marital.StatusNever married -0.8646 0.2673 -3.235
## Marital.StatusRefused to answer -5.1844 0.3086 -16.802
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## GenderFemale 0.04720 *
## Education.LevelAchieved Secondary and above 2.58e-08 ***
## SectorRURAL 3.99e-09 ***
## Marital.StatusMarried (Polygamy) 3.69e-14 ***
## Marital.StatusCo-Habiting/living together < 2e-16 ***
## Marital.StatusDivorced < 2e-16 ***
## Marital.StatusSeparated < 2e-16 ***
## Marital.StatusWidowed < 2e-16 ***
## Marital.StatusNever married 0.00122 **
## Marital.StatusRefused to answer < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(3.5254) family taken to be 1)
##
## Null deviance: 665.361 on 63 degrees of freedom
## Residual deviance: 63.944 on 53 degrees of freedom
## AIC: 641.55
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 3.525
## Std. Err.: 0.659
##
## 2 x log-likelihood: -617.555
exp(glm2$coefficients)
## (Intercept)
## 3.702632e+02
## GenderFemale
## 1.326880e+00
## Education.LevelAchieved Secondary and above
## 2.213955e+00
## SectorRURAL
## 2.316871e+00
## Marital.StatusMarried (Polygamy)
## 1.306891e-01
## Marital.StatusCo-Habiting/living together
## 1.722981e-02
## Marital.StatusDivorced
## 1.411606e-02
## Marital.StatusSeparated
## 4.423157e-02
## Marital.StatusWidowed
## 5.955828e-02
## Marital.StatusNever married
## 4.212399e-01
## Marital.StatusRefused to answer
## 5.603072e-03
#check for overdispersion using Pearsons Chi Sq statistic and Degrees of freedom
dp = sum(residuals(glm2,type ="pearson")^2)/glm2$df.residual
dp
## [1] 1.130467
#Great ! dp is close to 1 so we have solved for overdispersion.
#Plot residuals to check for goodness of fit
par(mfrow=c(2,2))
plot(glm2)
The AIC of the Negative Binomial regression is significantly lower than the Logistic regression and so we can conclude this model explains Financial Inclusion better .
The Residual plots reveal a close to normal distribution with possible 3 outliers
At significance of p<0.01 All predictors are significant except Gender:
Sector the difference in expected counts of those Financially included is expected to 2.31 times for those in rural sector/urban sector
Education Level the difference in expected counts of those Financially included is expected to increase 2.21 times for those whove Achieved Higher levels of education/Lower Levels of Education
Marital Status the difference in expected counts of those Financially included is expected to increase 1.3 times for Polygamous/Monogamous etc