Factors Affecting Contraception Use in Indonesia

Introduction

Since the late 1960s, Indonesia has seen fertility, as measured by the total fertility rate (TFR), declines by more than 50 percent, today it is around 2.26. We still have a way to go, as the ideal is for each family to have only 2 children. The use of contraception is essential in slowing down the population growth and reducing maternal morbidity and mortality by preventing unplanned pregnancies. Equally as important is how contraception usage reduces poverty. A growing body of studies found that family planning programs are associated with a decrease share of children and adults living in poverty (Bailey et al, 2014). Family planning program could increase investments in children through both income and price channels.

However, despite the ongoing effort of Indonesia government to promote Family Planning (FP), only 6 out of 10 poor married individuals in Indonesia use contraception of any form. Previous studies have shown evidence of the influence of age, educational level, socio-economic status, awareness of Family Planning methods and access to primary healthcare to the usage on contraception. This study aims to study the factors influencing contraception usage among the 40% poorest in Indonesia. The result of this study may aid policymakers in designing policy to promote contraception use among the poorest.

Methods

This study focuses on investigating the impact of socio-economics factors on contraceptive use among the 40% poorest in Indonesia, on provincial level. There is a total of 34 provinces, that are sampled in this study. The data used for this study are secondary, taken from the Badan Pusat Statistik (Statistics Indonesia) in 2020. The dependent variable is the percentage of contraception use among the 40% poorest, while the independent variables are the poverty severity index, years of schooling, percentage of household with access to primary healthcare, percentage of household with access to decent sanitation and percentage of household with at least one smoker.

Data Preprocessing

library(dplyr)
library(MASS)
library(GGally)
library(MLmetrics)
library(performance)
library(ggplot2)
library(lmtest)
library(car)
library(see)
library(patchwork)

Read the data

bps <- read.csv('bps.csv')
tail(bps)

##          provinsi parahmiskin kemiskinan sanitasilayak perokok kontrasepsi
## 30 SULAWESI BARAT       0.355     10.985         44.61   27.06       51.18
## 31         MALUKU       0.995     17.670         50.86   27.09       44.77
## 32   MALUKU UTARA       0.250      6.840         49.52   31.18       54.01
## 33    PAPUA BARAT       1.935     21.840         57.81   28.67       34.99
## 34          PAPUA       2.290     27.040         15.78   26.05       11.87
## 35                         NA         NA            NA      NA          NA
##    lamasekolah aksesfaskes
## 30        8.33       79.52
## 31       10.20       79.13
## 32        9.42       80.52
## 33       10.00       73.53
## 34        6.96       36.44
## 35          NA          NA

Handling missing values

bps <- na.omit(bps)

Set provinsi column as index

rownames(bps) <- bps$provinsi
bps$provinsi <- NULL

bps

##                      parahmiskin kemiskinan sanitasilayak perokok kontrasepsi
## ACEH                       0.665  15.165000         52.32   28.70       51.70
## SUMATERA UTARA             0.340   8.729999         61.15   27.46       47.63
## SUMATERA BARAT             0.180   6.355000         40.79   30.75       55.72
## RIAU                       0.260   6.990000         59.17   29.04       59.38
## JAMBI                      0.280   7.555000         51.02   28.54       70.36
## SUMATERA SELATAN           0.515  12.635000         53.60   30.91       71.45
## BENGKULU                   0.515  15.070000         30.50   33.14       72.57
## LAMPUNG                    0.440  12.460000         40.55   34.39       69.60
## KEP. BANGKA BELITUNG       0.110   4.560000         78.42   29.18       70.18
## KEP. RIAU                  0.150   5.850000         74.05   27.59       50.60
## DKI JAKARTA                0.090   3.445000         85.53   26.04       61.42
## JAWA BARAT                 0.240   6.865000         51.21   32.97       67.89
## JAWA TENGAH                0.290  10.690000         65.01   27.40       63.09
## DI YOGYAKARTA              0.340  11.570000         80.19   22.87       63.12
## JAWA TIMUR                 0.410  10.285000         53.98   27.93       66.24
## BANTEN                     0.205   5.015000         50.31   31.69       66.05
## BALI                       0.105   3.700000         83.73   20.96       63.03
## NUSA TENGGARA BARAT        0.500  14.220000         63.09   30.49       60.92
## NUSA TENGGARA TIMUR        1.125  20.855000         35.47   27.33       43.11
## KALIMANTAN BARAT           0.240   7.385000         39.26   28.50       66.72
## KALIMANTAN TENGAH          0.155   4.895000         41.45   29.84       70.14
## KALIMANTAN SELATAN         0.155   4.510000         47.52   23.95       73.48
## KALIMANTAN TIMUR           0.225   5.925000         68.24   24.52       60.09
## KALIMANTAN UTARA           0.290   6.560000         61.10   27.63       53.14
## SULAWESI UTARA             0.210   7.585000         60.92   28.41       69.72
## SULAWESI TENGAH            0.890  13.330000         44.31   31.64       62.93
## SULAWESI SELATAN           0.385   8.625000         69.41   25.59       54.47
## SULAWESI TENGGARA          0.590  11.140000         56.00   26.80       51.22
## GORONTALO                  0.595  15.415000         42.81   32.37       69.74
## SULAWESI BARAT             0.355  10.985000         44.61   27.06       51.18
## MALUKU                     0.995  17.670000         50.86   27.09       44.77
## MALUKU UTARA               0.250   6.840000         49.52   31.18       54.01
## PAPUA BARAT                1.935  21.840000         57.81   28.67       34.99
## PAPUA                      2.290  27.040001         15.78   26.05       11.87
##                      lamasekolah aksesfaskes
## ACEH                        9.71       67.98
## SUMATERA UTARA              9.83       67.30
## SUMATERA BARAT              9.34       82.69
## RIAU                        9.47       73.05
## JAMBI                       8.97       73.12
## SUMATERA SELATAN            8.68       75.04
## BENGKULU                    9.20       79.12
## LAMPUNG                     8.51       78.06
## KEP. BANGKA BELITUNG        8.49       86.67
## KEP. RIAU                  10.22       80.35
## DKI JAKARTA                11.17       76.83
## JAWA BARAT                  8.96       79.10
## JAWA TENGAH                 8.19       84.42
## DI YOGYAKARTA               9.95       84.64
## JAWA TIMUR                  8.31       79.36
## BANTEN                      9.22       78.36
## BALI                        9.31       89.68
## NUSA TENGGARA BARAT         8.08       74.26
## NUSA TENGGARA TIMUR         8.09       54.43
## KALIMANTAN BARAT            7.90       72.90
## KALIMANTAN TENGAH           8.95       77.82
## KALIMANTAN SELATAN          8.69       86.70
## KALIMANTAN TIMUR            9.99       76.83
## KALIMANTAN UTARA            9.30       87.02
## SULAWESI UTARA              9.74       83.71
## SULAWESI TENGAH             9.09       81.93
## SULAWESI SELATAN            8.86       88.71
## SULAWESI TENGGARA           9.41       85.97
## GORONTALO                   8.26       85.40
## SULAWESI BARAT              8.33       79.52
## MALUKU                     10.20       79.13
## MALUKU UTARA                9.42       80.52
## PAPUA BARAT                10.00       73.53
## PAPUA                       6.96       36.44

Remove unused column.

bps <- bps[-c(2,4)]

Rearrange dataframe column.

bps <- bps[, c(3,1,2,4,5)]

Exploratory Data Analysis

str(bps)

## 'data.frame':    34 obs. of  5 variables:
##  $ kontrasepsi  : num  51.7 47.6 55.7 59.4 70.4 ...
##  $ parahmiskin  : num  0.665 0.34 0.18 0.26 0.28 ...
##  $ sanitasilayak: num  52.3 61.2 40.8 59.2 51 ...
##  $ lamasekolah  : num  9.71 9.83 9.34 9.47 8.97 ...
##  $ aksesfaskes  : num  68 67.3 82.7 73.1 73.1 ...

Data Description

provinsi: the province state of Indonesia
kontrasepsi: percentage of married couple using contraception of any form
parahmiskin: poverty severity index
sanitasilayak: percentage of access to decent sanitation
lamasekolah: duration of formal schooling
aksesfakses: percentage of access to primary healthcare

summary(bps)

##   kontrasepsi     parahmiskin     sanitasilayak    lamasekolah    
##  Min.   :11.87   Min.   :0.0900   Min.   :15.78   Min.   : 6.960  
##  1st Qu.:52.06   1st Qu.:0.2137   1st Qu.:44.38   1st Qu.: 8.495  
##  Median :62.17   Median :0.3150   Median :52.96   Median : 9.145  
##  Mean   :58.90   Mean   :0.4800   Mean   :54.70   Mean   : 9.082  
##  3rd Qu.:69.17   3rd Qu.:0.5150   3rd Qu.:62.60   3rd Qu.: 9.650  
##  Max.   :73.48   Max.   :2.2900   Max.   :85.53   Max.   :11.170  
##   aksesfaskes   
##  Min.   :36.44  
##  1st Qu.:74.45  
##  Median :79.12  
##  Mean   :77.66  
##  3rd Qu.:84.24  
##  Max.   :89.68

Takeaway: - In Indonesia, only 59% of poor married couples are using contraception, with each province ranging from 12% (gasp!) to 73%.

Poverty severity index is the variance of expenditure of the poor relative to the average poor people’s expenditure at a location. The higher the value, the more severe the disparity even among the poor. In Indonesia, the severity of poverty is averaging 0.48.
Sanitation may play a big role in contraception use among the poor. It might be less obvious, but one of the barrier to meeting the sexual and reproductive health and rights needs of population is the lack access to clean water, decent toilets, and good hygiene. For example, long-acting reversible contraceptives (LARCs),such as intrauterine devices and implants, require clean hands and clean instruments for insertion and removal to prevent infection. Insertable barrier methods (such as a diaphragms, cervical caps and sponges) and insertable hormonal methods (such as the vaginal contraceptive ring) also require clean hands in order to prevent vaginal infections. The percentage of poor household with access to decent sanitation is averaging 54.7%, ranging from 16-86%.
The average of years of schooling in Indonesia is 9 years, ranging form 7 years to 11 years. The assumption that longer years of schooling equates higher sexual education is going to be tested later.
The percentage of household with access to primary healthcare paints a better picture, averaging 78% in Indonesia. However, this number isn’t exclusive to poor household.

Now, I will draw a plot for preliminary assessment of how each independent variables correlates to conraception use.

ggpairs(bps)

The strongest correlation pattern from both scatterplots and correlation values are between contraception use with the poverty severity and access to primary healthcare. The higher the poverty severity, the lower the percentage of contraception use. Meanwhile, the higher the access to primary healthcare, the higher the percentage of contraception use.

Contraception use among the 40% poorest only has a weak positive correlation with decent sanitation and almost no correlation with years of school.

Then, we will use boxplot for simpler graph to show how the values in the data are spread out.

boxplot(bps)

Takeaway:

From the longer tail on the bottom, we can say that contraception use is left skewed. The mean is less than the median. There is one outlier in the bottom too.
As the range of value of poverty severity index is not that high, it is hard to conclude from the plot.However, it seems that there are 3 outlier on the upper side.
Access to decent sanitation is skewed right, the mean is more than the median.There is one outlier in the bottom though.
As is poverty severty, years of schooling doesn’t have a big range, therefore it is hard to conclude from the plot. However, the observation is fairly evenly distributed.
Percentage access to primary healthcare is evenly distributed with 2 outliers.

Linear Regression Modelling

model_all <- lm(kontrasepsi ~., data = bps)
summary(model_all)

## 
## Call:
## lm(formula = kontrasepsi ~ ., data = bps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.944  -5.843   1.472   5.255  13.341 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    59.6280    20.0668   2.971   0.0059 ** 
## parahmiskin   -17.2351     3.7566  -4.588 7.96e-05 ***
## sanitasilayak  -0.1678     0.1180  -1.423   0.1655    
## lamasekolah    -2.0423     1.9369  -1.054   0.3004    
## aksesfaskes     0.4542     0.1865   2.435   0.0213 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.552 on 29 degrees of freedom
## Multiple R-squared:  0.6882, Adjusted R-squared:  0.6452 
## F-statistic:    16 on 4 and 29 DF,  p-value: 5.028e-07

Takeaway:

Two predictor variables are significant: poverty severety index and access to primary healthcare with at least 95% confidence interval. It sugests that changes in both predictors are associated with changes in the contraception use. Meanwhile, years of schooling and access to decent sanitation doesn’t impact contraception use significantly.
The model can explain 64.5% of the variances of contraception use, while the remaining is explained by variables outside the model.
However, we are witnessing a special case of suppresion on both sanitasilayak and lamasekolah. Supression is when the regression coefficient and the correlation between dependent and independent variable do not have the same sign. Both sanitasilayak and lamasekolah has a positive correlation with kontrasepsi. However, their regression coefficient signs are negative.

According to Falk and Miller (1992), there are three reasons why suppresion happens:

The original relation between both variables is close to zero, that the difference in the signs simply refleects random variation around zero.
Multicollinearity.
Real suppresion: an important predictor variable, necessary in understanding the true relationship between the latent variables, suppresses the effect of another predictor variable.

For sanitasilayak with close to zero correlation with kontrasepsi, it is possbile than both reasons (a) and (c) cause the flip of the sign. However, for sanitasilayak it is possible that reason (b) and (c) are the cause. To begin with, sanitasilayak and parahmiskin has a correlation of 0.49. While not terribly high, it still could cause coeffiecient sign flip (it is supposed to be positive sign in simple linear regression). Not to mention, the variable parahmiskin explain kontrasepsi so well that it supresses sanitasilayak which is correlated to it, with weaker predictive ability to kontrasepsi.

For now, we shall see what stepwise regression has to say.

Model Evaluation

Model Performance

Before testing the assumpstions of linear regressions, we are going to do stepwise regression. It is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. In particular, we are performing backward stepwise regression, an approach that begins with a full model and at each step gradually eliminates variables from the model.

model_back <- step(object = model_all, direction = "backward", trace =0)
summary(model_back)

## 
## Call:
## lm(formula = kontrasepsi ~ parahmiskin + sanitasilayak + aksesfaskes, 
##     data = bps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.796  -6.586   1.733   5.132  14.065 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    45.6134    15.0620   3.028  0.00502 ** 
## parahmiskin   -17.4109     3.7599  -4.631  6.6e-05 ***
## sanitasilayak  -0.2290     0.1029  -2.224  0.03381 *  
## aksesfaskes     0.4399     0.1864   2.360  0.02496 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.566 on 30 degrees of freedom
## Multiple R-squared:  0.6763, Adjusted R-squared:  0.6439 
## F-statistic: 20.89 on 3 and 30 DF,  p-value: 1.68e-07

The stepwise regression remove years of schooling variable from the model, as suspected. Altough previous studies attributes education as a factor to contraception usage, in Indonesia, the lack of sexual education in formal education makes it that even higher educated people in Indonesia are as lacking awareness of the benefit of contraception as their lower educated counterparts. Now, aside from poverty severty and access to primary healthcare, the access to decent sanitation also become significant.

The stepwise regression, however, doesn’t remove sanitasilayak variable which is either being supressed by parahmiskin. Although the coeffiecient sign of sanitasilayak is flipped (it is supposed to be positive in its simple linear regression), it is stil fine to have both in the model. We can say that sanitasilayak is correlated with parahmiskin such that the marginal relationship between sanitasilayak and kontrasepsi is positive, but the relationship would be negative after controlling for parahmiskin.

It is crucial to note that sanitasilayak has a higher correlation with another independent variable parahmiskin than it is with the target variable. Later, we will test for multicollinearity using VIF to rule out a violation.

For now, we can apply model selection criterion such as AIC and BIC.

compare_performance(model_all, model_back)

## # Comparison of Model Performance Indices
## 
## Name       | Model |     AIC | AIC weights |     BIC | BIC weights |    R2 | R2 (adj.) |  RMSE | Sigma
## ------------------------------------------------------------------------------------------------------
## model_all  |    lm | 240.559 |       0.411 | 249.717 |       0.245 | 0.688 |     0.645 | 6.974 | 7.552
## model_back |    lm | 239.838 |       0.589 | 247.470 |       0.755 | 0.676 |     0.644 | 7.107 | 7.566

model_back has both lower AIC and BIC than model_all, albeit slightly lower R^2 and higher RMSE. The chosen model with less information loss is model_back. Now that we have chosen model_last, let’s interpret the model.

summary(model_back)

## 
## Call:
## lm(formula = kontrasepsi ~ parahmiskin + sanitasilayak + aksesfaskes, 
##     data = bps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.796  -6.586   1.733   5.132  14.065 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    45.6134    15.0620   3.028  0.00502 ** 
## parahmiskin   -17.4109     3.7599  -4.631  6.6e-05 ***
## sanitasilayak  -0.2290     0.1029  -2.224  0.03381 *  
## aksesfaskes     0.4399     0.1864   2.360  0.02496 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.566 on 30 degrees of freedom
## Multiple R-squared:  0.6763, Adjusted R-squared:  0.6439 
## F-statistic: 20.89 on 3 and 30 DF,  p-value: 1.68e-07

Remember:

kontrasepsi: percentage of married couple using contraception of any form
parahmiskin: poverty severity index
sanitasilayak: percentage of access to decent sanitation
aksesfakses: percentage of access to primary healthcare

Interpretations: - Intercept: 45.6134

If all predictor variables are 0, then the percentage of contraception use among the 40% poorest would be 45.7%.

parahmiskin: -17.4109

All else equals, every 1 point increase in poverty severity index, there would be a -17.4% reduction in contraception use among the 40% poorest. This is in line with the theory that poverty hinders the ability to procure means of contraception. A severe poverty even more so.

sanitasilayak: -0.2290

All else equals, every 1% increase of the poor’s access to sanitation would reduce contraception use among the poorest by by 0.23%. sanitasilayak is correlated with parahmiskin such that the marginal relationship between sanitasilayak and kontrasepsi is positive, but the relationship would be negative after controlling for parahmiskin.

aksesfaskes: 0.4399

All else equals, every 1% increase of access to primary healthcare, there would be a 0.44% increase of contraception usage in the 40% poorest.

Assumption

let’s test for Gauss Markov assumption.

Linearity

The relationship between independent and dependent variable is linear. If this assumption is violated, the linear regression may not be able to explain the data. The prediction would be seriously spurious (false).

Linearity hypothesis test:

\[ H_0: correlation\ isn't\ significant\\ H_1: correlation\ is \ significant \]

ggcorr(bps, label = T, label_size = 3, hjust = 1)

model_back$call

## lm(formula = kontrasepsi ~ parahmiskin + sanitasilayak + aksesfaskes, 
##     data = bps)

The correlations between contraception use and access to poverty severity and access to primary healthcare are strong, while it is weak between contraception use and access to decent sanitation. It is still a correlation, nonetheless. After controlling for other variables in model_back, the variable becomes significant.

H0 is rejected, the correlation is significant.

Normality of Residuals

The residual of a linear regression should be evenly distributed with a mean of 0 , meaning that most the error should be gather around the value of 0. If the error distribution is significantly non-normal, confidence intervals may be too wide or too narrow.

\[ H_0: error/residuals\ is\ normally\ distributed\\ H_1: error/residual\ isn't\ normally\ distributed\\ \]

hist(model_back$residuals)

From the graphic, it seems that model_back’s residuals are heavy-tailed on the left. This isn’t surprising, considering the amount of outliers on the left hand side we encounter earlier during exploratory data analysis. The residuals doesn’t look normal.

To be sure, then we run Shapiro test to check the normality of model_back’s residuals.

shapiro.test(model_back$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model_back$residuals
## W = 0.93362, p-value = 0.03991

p-value is lower than 0.05, therefore, H0 is rejected. The model doesn’t pass normality test. As the consequences, significancy test and error value of intercept and slopes of each predictor are biased. We can remediate this by:

Adding more observations
Transforming data of target variable.

When I was new to regression, I would do anything to fix normality issue in residual. However, we should note that out of all linear regression assumption, normality of residuals is the least important of all (Gelman & Hill, 2006). In general, normality isn’t a good assumption. The normal distribution has very light tails, and this makes the regression estimate quite sensitive to outliers. For the case of this regression, it is unreasonable to remove outliers considering the sample size.

For the purpose of estimating the regression line, the assumption of normality is barely important at all, as long as the tail isn’t too heavy. The model_back does have heavier left tail, but it isn’t that heavy.

Unfortunately, all that said, this would compromise with the standard error of our predicted values. But I wouldn’t be very worried; the fact that many unimodal distributions have very close to 95% of their distribution within about 2 standard deviation of the mean tends to result in reasonable performance of prediction. Thus, I would proceed with this model.

Homoskedasticity

Homoskedasticity means that all its random variables have the same finite variance. The violation of homoskedasticity means that we have heteroskedascity in our model. The model would remain unbiased, but it is no longer the best linear unbiased estimator.

\[ H_0: Error\ variance\ spread\ evenly\ (Homoscedasticity)\\ H_1: Error\ variance\ doesn't\ spread\ evenly\ creating\ a\ pattern\ (Heteroscedasticity) \]

plot(model_back$fitted.values, model_back$residuals)
abline(h=0, col= "red", lty =2)

The residuals variances seems to spread quite randomly, but we can’t be too sure. So, we can proceed to Breusch-Pagan test.

bptest(model_back)

## 
##  studentized Breusch-Pagan test
## 
## data:  model_back
## BP = 3.025, df = 3, p-value = 0.3878

p-value is higher than 0.05, it fails to reject H0. Therefore, there is no heteroskedasticity in the model.

Multicolinearity

Multicolinearity happens when there is a strong correlation between predictor variables. Using the VIF test, if the VIF > 10, then there is multicolinearity.

vif(model_back)

##   parahmiskin sanitasilayak   aksesfaskes 
##      1.915822      1.459501      2.057548

In spite of our finding of a strong correlation between predictor, there is no multicolinearity, which isn’t suprising since the correlation isn’t more than 0.8. We can safely proceed with this model.

Prediction

prediksi_kontrap <- predict(object = model_back, newdata=bps)

MAPE(y_pred = prediksi_kontrap, y_true = bps$kontrasepsi )

## [1] 0.1133945

On average, the forecast using model_back is off by 11.3%. Since it is less than 20, model_back has a good prediction ability.

Conclusion

There are significant relationships between sanitation, healthcare and the severity of poverty with contraception usage among the 40% poorest in Indonesia. An increase in the severity of poverty and sanitation would decrease the usage of contraception, while an increase in access to basic healthcare for the poor would increase the contraception use among the poorest. Meanwhile, there is no significant impact of education on contraception usage. Higher formal education doesn’t translate to higher usage of contraception, which might be caused by poor integration of sexual education in Indonesia’s formal education. Our final model has satisfied classical assumptions except for normality assumption, which is sensitive to outlier. It can explain 64.4% of the variance in target variable, while the remaining is explained by variables outside the model. The model also has a decent prediction ability, with MAPE 11.3%, which is less than 20.

Since contraception usage is one of the government’s forefront strategy to mitigate poverty, this research propose targeted interventions to increase the uptake of contraceptive methods among the 40% poorest in Indonesia. To achieve this, government needs to pay attention to improve access to sanitation and basic healthcare for the poor, and also a better welfare handouts scheme to not only address poverty, but also the severity of poverty. Meanwhile, Indonesia education system needs to integrate more with sexual education throughout formal education for it to have a significant impact on the awareness of contraception usage and alleviate myths and taboos in the society.