Survey Data Source: National Household Education Surveys (NHES) Program 2019: Parent and Family Involvement in Education (PFI)

  1. Define a binary outcome variable of your choosing and define how you recode the original variable.

Outcome variable is whether a child enjoys school.

Codebook variable is Item 50: SEENJOY, with levels 1 (Strongly agree) to 4 (Strongly disagree).

I will recode the variable as enjoy_school, with levels 1 and 2 as “1” (YES) and levels 3 and 4 as “0” (NO).

  1. State a research question about what factors you believe will affect your outcome variable.

How does parent volunteerism at school affect whether a child enjoys school?

How does a diagnosed developmental delay affect whether a child enjoys school?

  1. Define at least 2 predictor variables, based on your research question. For this assignment, it’s best if these are categorical variables.

Predictor 1: adult_volunteer; Item 60B: FSVOL “… has any adult in this child’s household … served as a volunteer in this child’s classroom or elsewhere in the school?”

Predictor 2: dev_delay; Item 76K: HDDELAYX “Has a health or education professional told you that this child has … a developmental delay?”

  1. Perform a descriptive analysis of the outcome variable by each of the variables you defined in part b. (e.g. 2 x 2 table, 2 x k table). Follow a similar approach to presenting your statistics as presented in Sparks 2009 (in the Google drive). This can be done easily using the tableone package!
library(haven)
library(car)
## Loading required package: carData
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
library(questionr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(forcats)
library(tableone)
library(srvyr)
## 
## Attaching package: 'srvyr'
## The following object is masked from 'package:stats':
## 
##     filter
# Read Stata file

pfi19 = read_dta(file = "C:\\UTSA\\OneDrive - University of Texas at San Antonio\\1_M_7283_StatsII\\Homework\\pfi_pu_pert_dat_dta.dta")

# Recode variables

pfi19$enjoy_school <- Recode(pfi19$SEENJOY, recodes="1:2=1; 3:4=0; else=NA")

pfi19$FSVOL <- as.numeric(pfi19$FSVOL)

pfi19$adult_volunteer <- Recode(pfi19$FSVOL, recodes="1='Yes'; 2='No'; else=NA", as.factor=T)

pfi19$HDDELAYX <- as.numeric(pfi19$HDDELAYX)

pfi19$dev_delay <- Recode(pfi19$HDDELAYX, recodes="1='Yes'; 2='No'; else=NA", as.factor=T)

# Filter cases

pfi19 <- pfi19 %>%
  filter(is.na(enjoy_school)==F,
         is.na(adult_volunteer)==F,
         is.na(dev_delay)==F)
  1. Calculate descriptive statistics (mean or percentages) for each variable using no weights or survey design, as well as with full survey design and weights.
library(srvyr)

options(survey.lonely.psu = "adjust")

pfi19design <- svydesign(ids = ~PPSU, strata= ~PSTRATUM, weights = ~FPWT, data = pfi19, nest = TRUE)
pfi19design
## Stratified Independent Sampling design (with replacement)
## svydesign(ids = ~PPSU, strata = ~PSTRATUM, weights = ~FPWT, data = pfi19, 
##     nest = TRUE)
nodesign <- svydesign(ids = ~1,  weights = ~1, data = pfi19)
nodesign
## Independent Sampling design (with replacement)
## svydesign(ids = ~1, weights = ~1, data = pfi19)
vol_deswts <- svyby(formula = ~enjoy_school,
           by = ~adult_volunteer,
           design = pfi19design,
           FUN = svymean)

svychisq(~enjoy_school+adult_volunteer, design = pfi19design)
## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~enjoy_school + adult_volunteer, design = pfi19design)
## F = 77.128, ndf = 1, ddf = 15684, p-value < 2.2e-16
knitr::kable(vol_deswts,
      caption = "Survey Estimates of Student Enjoying School by Household Adult School Volunteer",
      align = 'c',  
      format = "html")
Survey Estimates of Student Enjoying School by Household Adult School Volunteer
adult_volunteer enjoy_school se
No No 0.8751728 0.0051769
Yes Yes 0.9314662 0.0038289
vol_nodesign <- svyby(formula = ~enjoy_school,
           by = ~adult_volunteer,
           design = nodesign,
           FUN = svymean)

svychisq(~enjoy_school+adult_volunteer, design = nodesign)
## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~enjoy_school + adult_volunteer, design = nodesign)
## F = 136.48, ndf = 1, ddf = 15686, p-value < 2.2e-16
knitr::kable(vol_nodesign,
      caption = "Estimates of Student Enjoying School by Household Adult School Volunteer - No survey design",
      align = 'c',  
      format = "html")
Estimates of Student Enjoying School by Household Adult School Volunteer - No survey design
adult_volunteer enjoy_school se
No No 0.8596530 0.0036515
Yes Yes 0.9198554 0.0033327
dev_deswts <- svyby(formula = ~enjoy_school,
           by = ~dev_delay,
           design = pfi19design,
           FUN = svymean)

svychisq(~enjoy_school+dev_delay, design = pfi19design)
## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~enjoy_school + dev_delay, design = pfi19design)
## F = 42.74, ndf = 1, ddf = 15684, p-value = 6.445e-11
knitr::kable(dev_deswts,
      caption = "Survey Estimates of Student Enjoying School by Developmental Delay diagnosis",
      align = 'c',  
      format = "html")
Survey Estimates of Student Enjoying School by Developmental Delay diagnosis
dev_delay enjoy_school se
No No 0.9024252 0.0034473
Yes Yes 0.7823739 0.0245828
dev_nodesign <- svyby(formula = ~enjoy_school,
           by = ~dev_delay,
           design = nodesign,
           FUN = svymean)

svychisq(~enjoy_school+dev_delay, design = nodesign)
## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~enjoy_school + dev_delay, design = nodesign)
## F = 39.516, ndf = 1, ddf = 15686, p-value = 3.341e-10
knitr::kable(dev_nodesign,
      caption = "Estimates of Student Enjoying School by Developmental Delay diagnosis - No survey design",
      align = 'c',  
      format = "html")
Estimates of Student Enjoying School by Developmental Delay diagnosis - No survey design
dev_delay enjoy_school se
No No 0.8882637 0.0025633
Yes Yes 0.8034483 0.0165013
  1. Calculate percentages, or means, for each of your independent variables for each level of your outcome variable and present this in a table, with appropriate survey-corrected test statistics. (tableone package helps)
library(gtsummary)

# for household adult volunteerism

pfi19 %>%
  as_survey_design(strata = PSTRATUM,
                   weights = FPWT) %>%
  select(enjoy_school, adult_volunteer) %>%
  tbl_svysummary(by = adult_volunteer, 
              label = list(enjoy_school = "Child Enjoys School")) %>%
  add_p()
## i Column(s) enjoy_school are class "haven_labelled". This is an intermediate datastructure not meant for analysis. Convert columns with `haven::as_factor()`, `labelled::to_factor()`, `labelled::unlabelled()`, and `unclass()`. "haven_labelled" value labels are ignored when columns are not converted. Failure to convert may have unintended consequences or result in error.
## * https://haven.tidyverse.org/articles/semantics.html
## * https://larmarange.github.io/labelled/articles/intro_labelled.html#unlabelled
Characteristic No, N = 29,773,6531 Yes, N = 20,925,6171 p-value2
Child Enjoys School <0.001
0 3,716,561 (12%) 1,434,112 (6.9%)
1 26,057,092 (88%) 19,491,505 (93%)

1 n (%)

2 chi-squared test with Rao & Scott's second-order correction

# for developmental delay

pfi19 %>%
  as_survey_design(strata = PSTRATUM,
                   weights = FPWT) %>%
  select(enjoy_school, dev_delay) %>%
  tbl_svysummary(by = dev_delay, 
              label = list(enjoy_school = "Child Enjoys School")) %>%
  add_p()
## i Column(s) enjoy_school are class "haven_labelled". This is an intermediate datastructure not meant for analysis. Convert columns with `haven::as_factor()`, `labelled::to_factor()`, `labelled::unlabelled()`, and `unclass()`. "haven_labelled" value labels are ignored when columns are not converted. Failure to convert may have unintended consequences or result in error.
## * https://haven.tidyverse.org/articles/semantics.html
## * https://larmarange.github.io/labelled/articles/intro_labelled.html#unlabelled
Characteristic No, N = 49,002,4651 Yes, N = 1,696,8051 p-value2
Child Enjoys School <0.001
0 4,781,404 (9.8%) 369,269 (22%)
1 44,221,061 (90%) 1,327,536 (78%)

1 n (%)

2 chi-squared test with Rao & Scott's second-order correction

  1. Are there substantive differences in the descriptive results between the analysis using survey design and that not using survey design?

Yes.

For both levels of the adult_volunteer independent variable, there is a higher proportion of students who enjoy school using survey design compared to not using survey design.

For students with a diagnosed developmental delay, there is a LOWER proportion of students who enjoy school using survey design compared to not using survey design.

As expected, the standard errors are larger in the analysis with survey design compared to not using survey design.