The data I will be using for this analysis is from the 1988 Bangladesh Fertility Survey. I am exclusively using a sub-sample of 1,934 women grouped in 60 various districts. I am interested to see if contraceptive use of women varies by districts, age, and number of children. Unique features of this data are the district and the individual woman. Women are nested within districts. Districts provide a context for women.
How to cite this model in Zelig: Kosuke Imai, Gary King, and Oliva Lau. 2007. “normal: Normal Regression for Continuous Dependent Variables” in Kosuke Imai, Gary King, and Olivia Lau, “Zelig: Everyone’s Statistical Software,” http://gking.harvard.edu/zelig
| Dependent variable: | |
| use | |
| age | 0.002 (0.001) |
| Constant | 0.392*** (0.011) |
| Observations | 1,934 |
| Log Likelihood | -1,358.061 |
| Akaike Inf. Crit. | 2,720.122 |
| Note: | p<0.1; p<0.05; p<0.01 |
The table above tells us that contraceptive use increases with age and this is significant. However, this model does not take into consideration the nesting structure of this data-set such as district and number of living children. Hence, further analysis is necessary.
As seen below the histogram of contraceptive use in relation to living children produces normally distributed results:
dcoef <- bang %>%
group_by(district) %>%
do(mod = lm(use ~ lc, data = .))
coef <- dcoef %>% do(data.frame(lc = coef(.$mod)[2]))
hist(coef$lc)
coef <- data.frame(coef)
Desc(coef)
##
## -------------------------------------------------------------------------
## 'data.frame': 60 obs. of 1 variable:
## 1 $ lc: num 1.99e-02 1.03e-01 -1.05e-16 6.76e-02 2.23e-02 ...
##
## -------------------------------------------------------------------------
## 1 - lc (numeric)
##
## length n NAs unique 0s mean meanSE
## 60 60 0 59 2 0.053 0.013
##
## .05 .10 .25 median .75 .90 .95
## -0.079 -0.063 0.001 0.049 0.106 0.165 0.190
##
## rng sd vcoef mad IQR skew kurt
## 0.558 0.098 1.839 0.074 0.105 0.001 1.660
##
## lowest : -0.2272727, -0.217759, -0.07988529, -0.07918149, -0.07239459
## highest: 0.1710526, 0.1894957, 0.1935841, 0.3235294, 0.3303571
##
## Shapiro-Wilks normality test p.value : 0.022506
m1_lme <- lmer(use ~ age + (1|district), data = bang)
summary(m1_lme)
## Linear mixed model fit by REML ['lmerMod']
## Formula: use ~ age + (1 | district)
## Data: bang
##
## REML criterion at convergence: 2673.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3330 -0.8297 -0.5748 1.1374 1.6543
##
## Random effects:
## Groups Name Variance Std.Dev.
## district (Intercept) 0.01276 0.113
## Residual 0.22466 0.474
## Number of obs: 1934, groups: district, 60
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.375453 0.019041 19.719
## age 0.001918 0.001208 1.588
##
## Correlation of Fixed Effects:
## (Intr)
## age -0.008
m2_lme <- lmer(use ~ age + (age|district), data = bang)
summary(m2_lme)
## Linear mixed model fit by REML ['lmerMod']
## Formula: use ~ age + (age | district)
## Data: bang
##
## REML criterion at convergence: 2671.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4695 -0.8214 -0.5763 1.1285 1.6280
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## district (Intercept) 1.268e-02 0.112610
## age 8.134e-06 0.002852 0.68
## Residual 2.240e-01 0.473264
## Number of obs: 1934, groups: district, 60
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.376035 0.018979 19.813
## age 0.001741 0.001276 1.364
##
## Correlation of Fixed Effects:
## (Intr)
## age 0.165
library(lme4)
stargazer(cpooling, m1_lme, m2_lme, type = "html")
| Dependent variable: | |||
| use | |||
| normal | linear | ||
| mixed-effects | |||
| (1) | (2) | (3) | |
| age | 0.002 | 0.002 | 0.002 |
| (0.001) | (0.001) | (0.001) | |
| Constant | 0.392*** | 0.375*** | 0.376*** |
| (0.011) | (0.019) | (0.019) | |
| Observations | 1,934 | 1,934 | 1,934 |
| Log Likelihood | -1,358.061 | -1,336.895 | -1,335.765 |
| Akaike Inf. Crit. | 2,720.122 | 2,681.790 | 2,683.530 |
| Bayesian Inf. Crit. | 2,704.060 | 2,716.934 | |
| Note: | p<0.1; p<0.05; p<0.01 | ||
library(texreg)
screenreg(list(m1_lme, m2_lme))
##
## ==========================================================
## Model 1 Model 2
## ----------------------------------------------------------
## (Intercept) 0.38 *** 0.38 ***
## (0.02) (0.02)
## age 0.00 0.00
## (0.00) (0.00)
## ----------------------------------------------------------
## AIC 2681.79 2683.53
## BIC 2704.06 2716.93
## Log Likelihood -1336.90 -1335.76
## Num. obs. 1934 1934
## Num. groups: district 60 60
## Variance: district.(Intercept) 0.01 0.01
## Variance: Residual 0.22 0.22
## Variance: district.age 0.00
## ==========================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05
m3_lme <- lmer(use ~ age + (age|district) + lc, data = bang)
m4_lme <- lmer(use ~ age*lc + (age|district), data = bang)
screenreg(list(m3_lme, m4_lme))
##
## ==========================================================
## Model 1 Model 2
## ----------------------------------------------------------
## (Intercept) 0.18 *** 0.25 ***
## (0.03) (0.04)
## age -0.01 *** 0.01 **
## (0.00) (0.00)
## lc 0.08 *** 0.07 ***
## (0.01) (0.01)
## age:lc -0.01 ***
## (0.00)
## ----------------------------------------------------------
## AIC 2711.23 2700.56
## BIC 2750.20 2745.10
## Log Likelihood -1348.62 -1342.28
## Num. obs. 1934 1934
## Num. groups: district 60 60
## Variance: district.(Intercept) 0.00 0.00
## Variance: district.age 0.00 0.00
## Variance: Residual 0.23 0.23
## ==========================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05
m3_lme <- lmer(use ~ age + (age|district) + lc, data = bang)
m4a_lme <- lmer(use ~ age*clc + (age|district), data = bang)
screenreg(list(m4_lme, m4a_lme))
##
## ==========================================================
## Model 1 Model 2
## ----------------------------------------------------------
## (Intercept) 0.25 *** 0.44 ***
## (0.04) (0.01)
## age 0.01 ** -0.00 **
## (0.00) (0.00)
## lc 0.07 ***
## (0.01)
## age:lc -0.01 ***
## (0.00)
## clc 0.07 ***
## (0.01)
## age:clc -0.01 ***
## (0.00)
## ----------------------------------------------------------
## AIC 2700.56 2700.56
## BIC 2745.10 2745.10
## Log Likelihood -1342.28 -1342.28
## Num. obs. 1934 1934
## Num. groups: district 60 60
## Variance: district.(Intercept) 0.00 0.00
## Variance: district.age 0.00 0.00
## Variance: Residual 0.23 0.23
## ==========================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05
m0_lme <- lmer(use ~ 1 + (1|district), data = bang)
summary(m0_lme)
## Linear mixed model fit by REML ['lmerMod']
## Formula: use ~ 1 + (1 | district)
## Data: bang
##
## REML criterion at convergence: 2664.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.2531 -0.8063 -0.5738 1.1268 1.6078
##
## Random effects:
## Groups Name Variance Std.Dev.
## district (Intercept) 0.01267 0.1126
## Residual 0.22487 0.4742
## Number of obs: 1934, groups: district, 60
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.3757 0.0190 19.77
In conclusion, the district variable has some impact on weather a woman uses contraceptives, with an ICC of .053. This makes sense as certain areas have better access and availability for women to obtain contraceptives. Additionally, as the number of children a woman had went up, so did the likelihood of contraceptive usage significantly. Age was less significant than number of children in general. However, further analysis proved that age was significantly negatively correlated to number of living children in relation to contraceptive usage. Meaning younger women, with less children are the least likely to use contraceptives. Further analysis should look at socio-economic status and education in relation to contraceptive usage as many woman around the world still aren’t provided with the basic knowledge to make such decisions.