7283_HW2

Survey Data Source: National Household Education Surveys (NHES) Program 2019: Parent and Family Involvement in Education (PFI)

Define a binary outcome variable of your choosing and define how you recode the original variable.

Outcome variable is whether a child enjoys school.

Codebook variable is Item 50: SEENJOY, with levels 1 (Strongly agree) to 4 (Strongly disagree).

I will recode the variable as enjoy_school, with levels 1 and 2 as “1” (YES) and levels 3 and 4 as “0” (NO).

State a research question about what factors you believe will affect your outcome variable.

How does parent volunteerism at school affect whether a child enjoys school?

How does a diagnosed developmental delay affect whether a child enjoys school?

Define at least 2 predictor variables, based on your research question. For this assignment, it’s best if these are categorical variables.

Predictor 1: adult_volunteer; Item 60B: FSVOL “… has any adult in this child’s household … served as a volunteer in this child’s classroom or elsewhere in the school?”

Predictor 2: dev_delay; Item 76K: HDDELAYX “Has a health or education professional told you that this child has … a developmental delay?”

Perform a descriptive analysis of the outcome variable by each of the variables you defined in part b. (e.g. 2 x 2 table, 2 x k table). Follow a similar approach to presenting your statistics as presented in Sparks 2009 (in the Google drive). This can be done easily using the tableone package!

library(haven)
library(car)

## Loading required package: carData

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

library(survey)

## Loading required package: grid

## Loading required package: Matrix

## Loading required package: survival

## 
## Attaching package: 'survey'

## The following object is masked from 'package:graphics':
## 
##     dotchart

library(questionr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(forcats)
library(tableone)
library(srvyr)

## 
## Attaching package: 'srvyr'

## The following object is masked from 'package:stats':
## 
##     filter

# Read Stata file

pfi19 = read_dta(file = "C:\\UTSA\\OneDrive - University of Texas at San Antonio\\1_M_7283_StatsII\\Homework\\pfi_pu_pert_dat_dta.dta")

# Recode variables

pfi19$enjoy_school <- Recode(pfi19$SEENJOY, recodes="1:2=1; 3:4=0; else=NA")

pfi19$FSVOL <- as.numeric(pfi19$FSVOL)

pfi19$adult_volunteer <- Recode(pfi19$FSVOL, recodes="1='Yes'; 2='No'; else=NA", as.factor=T)

pfi19$HDDELAYX <- as.numeric(pfi19$HDDELAYX)

pfi19$dev_delay <- Recode(pfi19$HDDELAYX, recodes="1='Yes'; 2='No'; else=NA", as.factor=T)

# Filter cases

pfi19 <- pfi19 %>%
  filter(is.na(enjoy_school)==F,
         is.na(adult_volunteer)==F,
         is.na(dev_delay)==F)

Calculate descriptive statistics (mean or percentages) for each variable using no weights or survey design, as well as with full survey design and weights.

library(srvyr)

options(survey.lonely.psu = "adjust")

pfi19design <- svydesign(ids = ~PPSU, strata= ~PSTRATUM, weights = ~FPWT, data = pfi19, nest = TRUE)
pfi19design

## Stratified Independent Sampling design (with replacement)
## svydesign(ids = ~PPSU, strata = ~PSTRATUM, weights = ~FPWT, data = pfi19, 
##     nest = TRUE)

nodesign <- svydesign(ids = ~1,  weights = ~1, data = pfi19)
nodesign

## Independent Sampling design (with replacement)
## svydesign(ids = ~1, weights = ~1, data = pfi19)

vol_deswts <- svyby(formula = ~enjoy_school,
           by = ~adult_volunteer,
           design = pfi19design,
           FUN = svymean)

svychisq(~enjoy_school+adult_volunteer, design = pfi19design)

## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~enjoy_school + adult_volunteer, design = pfi19design)
## F = 77.128, ndf = 1, ddf = 15684, p-value < 2.2e-16

knitr::kable(vol_deswts,
      caption = "Survey Estimates of Student Enjoying School by Household Adult School Volunteer",
      align = 'c',  
      format = "html")

Survey Estimates of Student Enjoying School by Household Adult School Volunteer
	adult_volunteer	enjoy_school	se
No	No	0.8751728	0.0051769
Yes	Yes	0.9314662	0.0038289

vol_nodesign <- svyby(formula = ~enjoy_school,
           by = ~adult_volunteer,
           design = nodesign,
           FUN = svymean)

svychisq(~enjoy_school+adult_volunteer, design = nodesign)

## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~enjoy_school + adult_volunteer, design = nodesign)
## F = 136.48, ndf = 1, ddf = 15686, p-value < 2.2e-16

knitr::kable(vol_nodesign,
      caption = "Estimates of Student Enjoying School by Household Adult School Volunteer - No survey design",
      align = 'c',  
      format = "html")

Estimates of Student Enjoying School by Household Adult School Volunteer - No survey design
	adult_volunteer	enjoy_school	se
No	No	0.8596530	0.0036515
Yes	Yes	0.9198554	0.0033327

dev_deswts <- svyby(formula = ~enjoy_school,
           by = ~dev_delay,
           design = pfi19design,
           FUN = svymean)

svychisq(~enjoy_school+dev_delay, design = pfi19design)

## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~enjoy_school + dev_delay, design = pfi19design)
## F = 42.74, ndf = 1, ddf = 15684, p-value = 6.445e-11

knitr::kable(dev_deswts,
      caption = "Survey Estimates of Student Enjoying School by Developmental Delay diagnosis",
      align = 'c',  
      format = "html")

Survey Estimates of Student Enjoying School by Developmental Delay diagnosis
	dev_delay	enjoy_school	se
No	No	0.9024252	0.0034473
Yes	Yes	0.7823739	0.0245828

dev_nodesign <- svyby(formula = ~enjoy_school,
           by = ~dev_delay,
           design = nodesign,
           FUN = svymean)

svychisq(~enjoy_school+dev_delay, design = nodesign)

## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~enjoy_school + dev_delay, design = nodesign)
## F = 39.516, ndf = 1, ddf = 15686, p-value = 3.341e-10

knitr::kable(dev_nodesign,
      caption = "Estimates of Student Enjoying School by Developmental Delay diagnosis - No survey design",
      align = 'c',  
      format = "html")

Estimates of Student Enjoying School by Developmental Delay diagnosis - No survey design
	dev_delay	enjoy_school	se
No	No	0.8882637	0.0025633
Yes	Yes	0.8034483	0.0165013

Calculate percentages, or means, for each of your independent variables for each level of your outcome variable and present this in a table, with appropriate survey-corrected test statistics. (tableone package helps)

library(gtsummary)

# for household adult volunteerism

pfi19 %>%
  as_survey_design(strata = PSTRATUM,
                   weights = FPWT) %>%
  select(enjoy_school, adult_volunteer) %>%
  tbl_svysummary(by = adult_volunteer, 
              label = list(enjoy_school = "Child Enjoys School")) %>%
  add_p()

## i Column(s) enjoy_school are class "haven_labelled". This is an intermediate datastructure not meant for analysis. Convert columns with `haven::as_factor()`, `labelled::to_factor()`, `labelled::unlabelled()`, and `unclass()`. "haven_labelled" value labels are ignored when columns are not converted. Failure to convert may have unintended consequences or result in error.

## * https://haven.tidyverse.org/articles/semantics.html

## * https://larmarange.github.io/labelled/articles/intro_labelled.html#unlabelled

Characteristic	No, N = 29,773,653¹	Yes, N = 20,925,617¹	p-value²
Child Enjoys School			<0.001
0	3,716,561 (12%)	1,434,112 (6.9%)
1	26,057,092 (88%)	19,491,505 (93%)
¹ n (%) ² chi-squared test with Rao & Scott's second-order correction

# for developmental delay

pfi19 %>%
  as_survey_design(strata = PSTRATUM,
                   weights = FPWT) %>%
  select(enjoy_school, dev_delay) %>%
  tbl_svysummary(by = dev_delay, 
              label = list(enjoy_school = "Child Enjoys School")) %>%
  add_p()

## i Column(s) enjoy_school are class "haven_labelled". This is an intermediate datastructure not meant for analysis. Convert columns with `haven::as_factor()`, `labelled::to_factor()`, `labelled::unlabelled()`, and `unclass()`. "haven_labelled" value labels are ignored when columns are not converted. Failure to convert may have unintended consequences or result in error.

## * https://haven.tidyverse.org/articles/semantics.html

## * https://larmarange.github.io/labelled/articles/intro_labelled.html#unlabelled

Characteristic	No, N = 49,002,465¹	Yes, N = 1,696,805¹	p-value²
Child Enjoys School			<0.001
0	4,781,404 (9.8%)	369,269 (22%)
1	44,221,061 (90%)	1,327,536 (78%)
¹ n (%) ² chi-squared test with Rao & Scott's second-order correction

Are there substantive differences in the descriptive results between the analysis using survey design and that not using survey design?

Yes.

For both levels of the adult_volunteer independent variable, there is a higher proportion of students who enjoy school using survey design compared to not using survey design.

For students with a diagnosed developmental delay, there is a LOWER proportion of students who enjoy school using survey design compared to not using survey design.

As expected, the standard errors are larger in the analysis with survey design compared to not using survey design.

7283_HW2

Ryan Labio

2/7/2022