Introduction

Preschool is a form of early childhood education that is offered to children of ages between three and five years before they enter primary school. This program is designed to develop children’s cognitive and behavioral skills. The purpose of the analysis shown below was primarily to see how a child’s age, sex, citizenship status and parent’s employment status may predict the probability of his/her enrollment in a preschool program in New York. Data was derived from the Public Use Microdata Samples (PUMS) 2017.

Variables

  (i) PRE_SCHG (Preschool Enrollment):
      0. Currently not enrolled in a preshool program
      1. Currently enrolled in a preschool program
              
  (ii) SEX:
      1. Male
      2. Female
      
  (iii) Age:
      3. 3 years
      4. 4 years
      5. 5 years
      
  (vi) ESP (Employment Status of Parents):
      1. Both parents in labor force
      2. Father only in labor force
      3. Mother only in labor force
      4. Neither parent in labor force living with one parent and living with father
      5. Father in the labor force
      6. Father not in labor force living with mother
      7. Mother in the labor force
      8. Mother not in labor force
      
  (v) CIT (Citizenship Status):
      1. Born in the U.S.
      2. Born in Puerto Rico, Guam, the U.S. Virgin Islands, or the Northern Marianas
      3. Born abroad of American parent(s)
      4. U.S. citizen by naturalization
      5. Not a citizen of the U.S.
      
      
library(readr)
PUMS_NY_2 <- read_csv("C:/Users/Nusrat/Desktop/MA - 3rd semester, Spring 19/SOC 791 - Independent Research (with Python)/PUMS Dataset/PUMS_NY_2.csv")
## Parsed with column specification:
## cols(
##   AGEP = col_double(),
##   SEX = col_double(),
##   SCH_TYPE = col_double(),
##   PRE_SCHG = col_double(),
##   CIT = col_double(),
##   DIS = col_double(),
##   ESP = col_double(),
##   NOP = col_double(),
##   OC = col_double(),
##   RAC1P = col_double(),
##   NATIVITY = col_double(),
##   RACASN = col_double(),
##   RACBLK = col_double(),
##   RACWHT = col_double()
## )
head(PUMS_NY_2)
## # A tibble: 6 x 14
##    AGEP   SEX SCH_TYPE PRE_SCHG   CIT   DIS   ESP   NOP    OC RAC1P
##   <dbl> <dbl>    <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     3     2        2        1     1     2     1     1     1     1
## 2     3     2        3        1     1     2     3     1     1     3
## 3     3     1        3        1     1     2     1     2     1     1
## 4     3     1        3        1     1     2     5     5     1     1
## 5     3     2        3        1     1     2     2     2     1     1
## 6     3     2        3        1     1     2     1     1     1     8
## # ... with 4 more variables: NATIVITY <dbl>, RACASN <dbl>, RACBLK <dbl>,
## #   RACWHT <dbl>

Logistic Regression

Model 1: AGE

library(Zelig)
## Loading required package: survival
m1 <- glm(PRE_SCHG ~ AGEP, family = binomial, data = PUMS_NY_2)
summary(m1)
## 
## Call:
## glm(formula = PRE_SCHG ~ AGEP, family = binomial, data = PUMS_NY_2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8546  -1.1108   0.6282   0.9047   1.2455  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.68025    0.18132  -14.78   <2e-16 ***
## AGEP         0.84055    0.04974   16.90   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5901.7  on 4366  degrees of freedom
## Residual deviance: 5586.5  on 4365  degrees of freedom
##   (1646 observations deleted due to missingness)
## AIC: 5590.5
## 
## Number of Fisher Scoring iterations: 4

The analysis from model 1 suggests that age has a statistically significant influence on preschool enrollment in NY.

The result above corresponds to the following model: -2.68 + 0.84 x AGEP.

This means that the log(odds) that at an age of 0 years the child is enrolled in a preschool program, is -2.68. This makes sence, as it is impossible for a newlyborn child to be attending a preschool program. The co-efficient of AGEP suggests that with an additional year increase in age of the child will increase the probability of his/her preschool enrollment by 0.84 units.

Adding More Independent Variables

Model 2: Sex, Age, Citizenship Status and Parent’s Employment Status

m2 <- glm(PRE_SCHG ~ AGEP+SEX+CIT+ESP, family = binomial, data = PUMS_NY_2)
summary(m2)
## 
## Call:
## glm(formula = PRE_SCHG ~ AGEP + SEX + CIT + ESP, family = binomial, 
##     data = PUMS_NY_2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9487  -1.1566   0.8342   0.9790   1.5321  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.53861    0.21960 -11.560  < 2e-16 ***
## AGEP         0.85991    0.05127  16.774  < 2e-16 ***
## SEX          0.06585    0.06546   1.006    0.314    
## CIT         -0.09109    0.05638  -1.616    0.106    
## ESP         -0.06503    0.01294  -5.026 5.01e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5677.4  on 4207  degrees of freedom
## Residual deviance: 5338.1  on 4203  degrees of freedom
##   (1805 observations deleted due to missingness)
## AIC: 5348.1
## 
## Number of Fisher Scoring iterations: 4

The analysis from model 2 suggests that both age and parent’s employment status have a statistically significant influence on preschool enrollment in NY. However, sex and citizenship status do not have a significant influence at all (with a larger p-value).

The result above corresponds to the following model: -2.63 + 0.86 x AGEP + 0.06 x SEX - 0.09 x CIT - 0.06 x ESP This means, given that everything else is constant, the log(odds) that at an age of 0 years the child is enrolled in a preschool program, is -2.63 (slightly less than model 1). The co-efficient of AGEP suggests that with an additional year increase in age of the child will increase the probability of his/her preschool enrollment by 0.86 units. The co-efficient of ESP suggests that with an additional unit increase in employment status of parents, the probability of the child’s preschool enrollment decreases by 0.06 units.

Model 3: Adding Interaction Between Sex and Age

m3 <- glm(PRE_SCHG ~ SEX*AGEP+ESP+CIT, family = binomial, data = PUMS_NY_2)
summary(m3)
## 
## Call:
## glm(formula = PRE_SCHG ~ SEX * AGEP + ESP + CIT, family = binomial, 
##     data = PUMS_NY_2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9642  -1.1395   0.8446   0.9907   1.5512  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.30418    0.59306  -5.571 2.53e-08 ***
## SEX          0.57789    0.37290   1.550    0.121    
## AGEP         1.07385    0.16244   6.611 3.83e-11 ***
## ESP         -0.06515    0.01294  -5.035 4.79e-07 ***
## CIT         -0.09130    0.05633  -1.621    0.105    
## SEX:AGEP    -0.14291    0.10245  -1.395    0.163    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5677.4  on 4207  degrees of freedom
## Residual deviance: 5336.1  on 4202  degrees of freedom
##   (1805 observations deleted due to missingness)
## AIC: 5348.1
## 
## Number of Fisher Scoring iterations: 4

Result: The interaction between sex and age for the participants in this dataset is not statistically significant.

Information Criteria: AIC, BIC

library(texreg)
## Warning: package 'texreg' was built under R version 3.5.3
## Version:  1.36.23
## Date:     2017-03-03
## Author:   Philip Leifeld (University of Glasgow)
## 
## Please cite the JSS article in your publications -- see citation("texreg").
htmlreg(list(m1,m2,m3),doctype = FALSE)
Statistical models
Model 1 Model 2 Model 3
(Intercept) -2.68*** -2.54*** -3.30***
(0.18) (0.22) (0.59)
AGEP 0.84*** 0.86*** 1.07***
(0.05) (0.05) (0.16)
SEX 0.07 0.58
(0.07) (0.37)
CIT -0.09 -0.09
(0.06) (0.06)
ESP -0.07*** -0.07***
(0.01) (0.01)
SEX:AGEP -0.14
(0.10)
AIC 5590.51 5348.07 5348.13
BIC 5603.28 5379.80 5386.20
Log Likelihood -2793.26 -2669.04 -2668.06
Deviance 5586.51 5338.07 5336.13
Num. obs. 4367 4208 4208
p < 0.001, p < 0.01, p < 0.05

When the models are compared side by side, the most complex model, i.e. model 3 seems to be the best fit.

To further confirm the result, AIC and BIC are calculated. Interestingly, both the AIC and BIC values are the lowest for model 2, indicating that model 2 is the best fit of all the three models (AIC:5348.07, BIC:5379.80).

Visualization

library(visreg)
## Warning: package 'visreg' was built under R version 3.5.3
visreg(m2,"AGEP", scale="response")

The graph above shows how preschool enrollment varies across age. As age increases, the likelihood of being enrolled in a preschool program increases.

library(visreg)
visreg(m2, "AGEP", by = "SEX", overlay = TRUE)

The overlay proves once again that there is no significant interaction between age and sex of children in the sample data.

library(visreg)
visreg(m3,"SEX", by = "AGEP", scale="response")

When sex is plotted by age, a more specific pattern emerges. Female children seem to be enrolled in preschool slightly more than male children at the age of three. The difference is negligible for four year olds. However, interistingly, male children seem to be enrolled in preschool slightly more than female children at the age of five.