HW Instructions

1. Define a binary outcome variable of your choosing and define how you recode the original variable.

fertility preference- do women aged 15-49 that want another child (V602); 0=no, 1=yes

2. State a research question about what factors you believe will affect your outcome variable.

Does women’s empowerment affect future desired fertility in Uganda?

I am interested in testing if certain proxy variables for empowerment affect whether ot not Ugandan women indicate that they want another child.

3. Define at least 2 predictor variables, based on your research question. For this assignment, it’s best if these are categorical variables.

Proxies for empowerment

access to a bank account (V170)

education level (V106)

use of internet (V171A)

use of contraception ( V312 )

# load packages
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loading required package: survival
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
library(ggplot2)
library(haven)
library(tableone)
library(gtsummary)
#read in data
uganda16 <- read_dta("C:/Users/rlutt/Downloads/UGIR7BFL.DTA")
uganda16<-zap_labels(uganda16)
#recodes
#bank account
uganda16$bank<-car::Recode(uganda16$v170, recodes= "0= 'No'; 1= 'Yes'")
#age groups
uganda16$agegroup <- car::Recode(uganda16$v013, recodes = "1='15-19'; 2='20-24';3='25-29'; 4 = '30-34';5='35-39';6='40-44';7='45+' ", as.factor=T)
#fertility preferences
uganda16$wantanotherchild<-ifelse(uganda16$v602!=9&uganda16$v602==1,1,0)
uganda16$fertilitypreference<-car::Recode(uganda16$wantanotherchild, recodes= "0= 'Do Not Want another child'; 1= 'Do Want another child'")
#education level
uganda16$educationlevel <- car::recode(uganda16$v106, 
                            recodes = "0 = 'none'; 1 = 'primEW'; 2:3='secondary and above' ",
                            as.factor=T)
#internet
uganda16$internet<-as.factor(uganda16$v171a)
uganda16$internet<-car::Recode(uganda16$v171a, recodes= "0='never'; 1= 'in the past year'; 2='over a year ago'; 3= 'yes, but unsure when'", as.factor=T)

#contraception
uganda16$contraception<-as.factor(uganda16$v313)
uganda16$contraception<-car::Recode(uganda16$v313, recodes= "0='none'; 1='folkloric method'; 2='traditional method' ;3= 'modern method'", as.factor=T)

# survey design variables
uganda16$psu <- uganda16$v021
uganda16$strata <- uganda16$v022
uganda16$pwt <- uganda16$v005/1000000

4. Perform a descriptive analysis of the outcome variable by each of the variables you defined in part b. (e.g. 2 x 2 table, 2 x k table). Follow a similar approach to presenting your statistics as presented in Sparks 2009 (in the Google drive). This can be done easily using the tableone package!

4.1 Calculate descriptive statistics (mean or percentages) for each variable using no weights or survey design, as well as with full survey design and weights.

descriptives table w/o weights or stratification

library(gt)
myvars<- c("bank", "agegroup", "internet", "contraception")
table<-CreateTableOne(vars=myvars, data=uganda16)
summary(table)
## 
##      ### Summary of categorical variables ### 
## 
## strata: Overall
##            var     n miss p.miss              level  freq percent cum.percent
##           bank 18506    0    0.0                 No 16218    87.6        87.6
##                                                 Yes  2288    12.4       100.0
##                                                                              
##       agegroup 18506    0    0.0              15-19  4276    23.1        23.1
##                                               20-24  3782    20.4        43.5
##                                               25-29  3014    16.3        59.8
##                                               30-34  2600    14.0        73.9
##                                               35-39  2029    11.0        84.8
##                                               40-44  1621     8.8        93.6
##                                                 45+  1184     6.4       100.0
##                                                                              
##       internet 18506    0    0.0   in the past year  1447     7.8         7.8
##                                               never 16847    91.0        98.9
##                                     over a year ago   212     1.1       100.0
##                                                                              
##  contraception 18506    0    0.0   folkloric method    46     0.2         0.2
##                                       modern method  4914    26.6        26.8
##                                                none 13088    70.7        97.5
##                                  traditional method   458     2.5       100.0
## 

descriptives table stratified by fertility preferences w/o weights

library(gt)

myvars<- c("bank", "agegroup", "internet", "contraception")
table1<-CreateTableOne(vars=myvars, data=uganda16, strata= "fertilitypreference")
 summary(table1)
## 
##      ### Summary of categorical variables ### 
## 
## fertilitypreference: Do Not Want another child
##            var    n miss p.miss              level freq percent cum.percent
##           bank 7027    0    0.0                 No 6033    85.9        85.9
##                                                Yes  994    14.1       100.0
##                                                                            
##       agegroup 7027    0    0.0              15-19  415     5.9         5.9
##                                              20-24  453     6.4        12.4
##                                              25-29  792    11.3        23.6
##                                              30-34 1319    18.8        42.4
##                                              35-39 1503    21.4        63.8
##                                              40-44 1431    20.4        84.1
##                                                45+ 1114    15.9       100.0
##                                                                            
##       internet 7027    0    0.0   in the past year  257     3.7         3.7
##                                              never 6722    95.7        99.3
##                                    over a year ago   48     0.7       100.0
##                                                                            
##  contraception 7027    0    0.0   folkloric method   32     0.5         0.5
##                                      modern method 2208    31.4        31.9
##                                               none 4605    65.5        97.4
##                                 traditional method  182     2.6       100.0
##                                                                            
## ------------------------------------------------------------ 
## fertilitypreference: Do Want another child
##            var     n miss p.miss              level  freq percent cum.percent
##           bank 11479    0    0.0                 No 10185    88.7        88.7
##                                                 Yes  1294    11.3       100.0
##                                                                              
##       agegroup 11479    0    0.0              15-19  3861    33.6        33.6
##                                               20-24  3329    29.0        62.6
##                                               25-29  2222    19.4        82.0
##                                               30-34  1281    11.2        93.2
##                                               35-39   526     4.6        97.7
##                                               40-44   190     1.7        99.4
##                                                 45+    70     0.6       100.0
##                                                                              
##       internet 11479    0    0.0   in the past year  1190    10.4        10.4
##                                               never 10125    88.2        98.6
##                                     over a year ago   164     1.4       100.0
##                                                                              
##  contraception 11479    0    0.0   folkloric method    14     0.1         0.1
##                                       modern method  2706    23.6        23.7
##                                                none  8483    73.9        97.6
##                                  traditional method   276     2.4       100.0
##                                                                              
## 
## p-values
##                    pApprox       pExact
## bank          9.538620e-09 1.121678e-08
## agegroup      0.000000e+00           NA
## internet      1.369170e-65 2.192775e-72
## contraception 2.104382e-35 3.523359e-35
## 
## Standardize mean differences
##                   1 vs 2
## bank          0.08632758
## agegroup      1.61187506
## internet      0.27742107
## contraception 0.19174423

descriptives table with weights and no stratification

design<-svydesign(ids = ~ psu, strata = ~ strata, weights =~ pwt, data=uganda16)
table2<-svyCreateTableOne(vars = c( "contraception", "internet", "agegroup", "bank"),
                  data=design)
summary(table2)
## 
##      ### Summary of categorical variables ### 
## 
## : Overall
##            var       n miss p.miss              level    freq percent
##  contraception 18506.0  0.0    0.0   folkloric method    45.4     0.2
##                                         modern method  5050.0    27.3
##                                                  none 12904.7    69.7
##                                    traditional method   506.0     2.7
##                                                                      
##       internet 18506.0  0.0    0.0   in the past year  1596.5     8.6
##                                                 never 16691.3    90.2
##                                       over a year ago   218.2     1.2
##                                                                      
##       agegroup 18506.0  0.0    0.0              15-19  4264.0    23.0
##                                                 20-24  3821.8    20.7
##                                                 25-29  3051.4    16.5
##                                                 30-34  2543.1    13.7
##                                                 35-39  2011.1    10.9
##                                                 40-44  1607.8     8.7
##                                                   45+  1206.7     6.5
##                                                                      
##           bank 18506.0  0.0    0.0                 No 16111.8    87.1
##                                                   Yes  2394.2    12.9
##                                                                      
##  cum.percent
##          0.2
##         27.5
##         97.3
##        100.0
##             
##          8.6
##         98.8
##        100.0
##             
##         23.0
##         43.7
##         60.2
##         73.9
##         84.8
##         93.5
##        100.0
##             
##         87.1
##        100.0
## 

4.2 Calculate percentages, or means, for each of your independent variables for each level of your outcome variable and present this in a table, with appropriate survey-corrected test statistics. (tableone package helps)

descriptives table stratified by fertility preferences w/ weights

design<-svydesign(ids = ~ psu, strata = ~ strata, weights =~ pwt, data=uganda16)
table3<-svyCreateTableOne(vars = c( "contraception", "internet", "agegroup", "bank"),
                  strata = c("fertilitypreference"), 
                  data=design)
summary(table3)
## 
##      ### Summary of categorical variables ### 
## 
## fertilitypreference: Do Not Want another child
##            var      n miss p.miss              level   freq percent cum.percent
##  contraception 7043.8  0.0    0.0   folkloric method   33.1     0.5         0.5
##                                        modern method 2226.6    31.6        32.1
##                                                 none 4582.0    65.1        97.1
##                                   traditional method  202.1     2.9       100.0
##                                                                                
##       internet 7043.8  0.0    0.0   in the past year  283.4     4.0         4.0
##                                                never 6710.9    95.3        99.3
##                                      over a year ago   49.5     0.7       100.0
##                                                                                
##       agegroup 7043.8  0.0    0.0              15-19  452.4     6.4         6.4
##                                                20-24  460.6     6.5        13.0
##                                                25-29  790.1    11.2        24.2
##                                                30-34 1287.2    18.3        42.5
##                                                35-39 1493.0    21.2        63.6
##                                                40-44 1426.4    20.3        83.9
##                                                  45+ 1134.1    16.1       100.0
##                                                                                
##           bank 7043.8  0.0    0.0                 No 6031.1    85.6        85.6
##                                                  Yes 1012.7    14.4       100.0
##                                                                                
## ------------------------------------------------------------ 
## fertilitypreference: Do Want another child
##            var       n miss p.miss              level    freq percent
##  contraception 11462.2  0.0    0.0   folkloric method    12.2     0.1
##                                         modern method  2823.4    24.6
##                                                  none  8322.7    72.6
##                                    traditional method   303.9     2.7
##                                                                      
##       internet 11462.2  0.0    0.0   in the past year  1313.1    11.5
##                                                 never  9980.3    87.1
##                                       over a year ago   168.7     1.5
##                                                                      
##       agegroup 11462.2  0.0    0.0              15-19  3811.7    33.3
##                                                 20-24  3361.2    29.3
##                                                 25-29  2261.3    19.7
##                                                 30-34  1255.9    11.0
##                                                 35-39   518.1     4.5
##                                                 40-44   181.4     1.6
##                                                   45+    72.6     0.6
##                                                                      
##           bank 11462.2  0.0    0.0                 No 10080.6    87.9
##                                                   Yes  1381.6    12.1
##                                                                      
##  cum.percent
##          0.1
##         24.7
##         97.3
##        100.0
##             
##         11.5
##         98.5
##        100.0
##             
##         33.3
##         62.6
##         82.3
##         93.3
##         97.8
##         99.4
##        100.0
##             
##         87.9
##        100.0
##             
## 
## p-values
##                    pApprox pExact
## contraception 1.690557e-22     NA
## internet      1.205203e-35     NA
## agegroup      0.000000e+00     NA
## bank          3.923077e-04     NA
## 
## Standardize mean differences
##                   1 vs 2
## contraception 0.17493730
## internet      0.29329347
## agegroup      1.59695403
## bank          0.06864078

4.3 Are there substantive differences in the descriptive results between the analysis using survey design and that not using survey design?

The first thing I noticed after applying the weights is that the sample size increased by a lot. The descriptive statistics also varied a lot after applying the weights. This exercise demonstrates how important weighting your analyses are.