Research Question:

How does zip code infuence child care quality?

Introduction

I am interested in understanding the relationship between zip code and the quality of child care programming thoroughout the New York City boroughs. Prior research indicates a strong correlation between school quality, educational attainment, and zip codes, in which zip codes can help predict test scores, college graduation rates, and overall academic progress. For many parents in New York with long and inflexible working hours, child care services are not optional. Parents may rely on child care programs to satisfy their child’s academic, social, and emotional needs. Child care facilites are usually close to home to prevent an inconvenient travel distance from after-school to the program and from the program to the household. I want to determine if zip codes can predict public health hazard violation rate. I would also like to how age range and zip code interact with the dependent variable, public health hazard violation rate.

Dataset

The “childcare” dataset is downloaded from Kaggle; it consists of inspections conducted, and the violations found, by the Department of Health and Mental Hygiene at child care programs thoroughout the New York City boroughs.

library(readr)
library(tidyverse)
## -- Attaching packages ------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0       v purrr   0.3.1  
## v tibble  2.0.1       v dplyr   0.8.0.1
## v tidyr   0.8.3       v stringr 1.4.0  
## v ggplot2 3.1.0       v forcats 0.4.0
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(Zelig)
## Loading required package: survival
## 
## Attaching package: 'Zelig'
## The following object is masked from 'package:purrr':
## 
##     reduce
## The following object is masked from 'package:ggplot2':
## 
##     stat
library(pander)
library(texreg)
## Version:  1.36.23
## Date:     2017-03-03
## Author:   Philip Leifeld (University of Glasgow)
## 
## Please cite the JSS article in your publications -- see citation("texreg").
## 
## Attaching package: 'texreg'
## The following object is masked from 'package:tidyr':
## 
##     extract
library(visreg)
library(effects)
## Loading required package: carData
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
childcare <- read_csv("C:/Users/Skippz/Desktop/dohmh-childcare-center-inspections.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   ZipCode = col_double(),
##   `Permit Number` = col_double(),
##   `Permit Expiration` = col_datetime(format = ""),
##   `Maximum Capacity` = col_double(),
##   `Building Identification Number` = col_double(),
##   `Date Permitted` = col_datetime(format = ""),
##   `Violation Rate Percent` = col_double(),
##   `Average Violation Rate Percent` = col_double(),
##   `Total Educational Workers` = col_double(),
##   `Average Total Educational Workers` = col_double(),
##   `Public Health Hazard Violation Rate` = col_double(),
##   `Average Public Health Hazard Violation Rate` = col_double(),
##   `Critical Violation Rate` = col_double(),
##   `Average Critical Violation Rate` = col_double(),
##   `Inspection Date` = col_datetime(format = "")
## )
## See spec(...) for full column specifications.
head(childcare)
## # A tibble: 6 x 34
##   `Center Name` `Legal Name` Building Street Borough ZipCode Phone
##   <chr>         <chr>        <chr>    <chr>  <chr>     <dbl> <chr>
## 1 ESF, INC      ESF Camp Ri~ 1        SPAUL~ BRONX     10471 718-~
## 2 ALL SEASONS ~ ALL SEASONS~ 190      E 162~ BRONX     10451 914-~
## 3 ALL SEASONS ~ ALL SEASONS~ 190      E 162~ BRONX     10451 914-~
## 4 ALL SEASONS ~ ALL SEASONS~ 190      E 162~ BRONX     10451 914-~
## 5 ALL SEASONS ~ ALL SEASONS~ 190      E 162~ BRONX     10451 914-~
## 6 YESHIVAT OHR~ YESHIVAT OH~ 86-06    135TH~ QUEENS    11418 718-~
## # ... with 27 more variables: `Permit Number` <dbl>, `Permit
## #   Expiration` <dttm>, Status <chr>, `Age Range` <chr>, `Maximum
## #   Capacity` <dbl>, `Day Care ID` <chr>, `Program Type` <chr>, `Facility
## #   Type` <chr>, `Child Care Type` <chr>, `Building Identification
## #   Number` <dbl>, URL <chr>, `Date Permitted` <dttm>, Actual <chr>,
## #   `Violation Rate Percent` <dbl>, `Average Violation Rate
## #   Percent` <dbl>, `Total Educational Workers` <dbl>, `Average Total
## #   Educational Workers` <dbl>, `Public Health Hazard Violation
## #   Rate` <dbl>, `Average Public Health Hazard Violation Rate` <dbl>,
## #   `Critical Violation Rate` <dbl>, `Average Critical Violation
## #   Rate` <dbl>, `Inspection Date` <dttm>, `Regulation Summary` <chr>,
## #   `Violation Category` <chr>, `Health Code Sub Section` <chr>,
## #   `Violation Status` <chr>, `Inspection Summary Result` <chr>
qualitycc<-sjlabelled::remove_all_labels(childcare)
head(qualitycc)
##                     Center.Name                Legal.Name Building
## 1                      ESF, INC        ESF Camp Riverdale        1
## 2 ALL SEASONS ABC DAY CARE, LLC ALL SEASONS DAY CARE, LLC      190
## 3 ALL SEASONS ABC DAY CARE, LLC ALL SEASONS DAY CARE, LLC      190
## 4 ALL SEASONS ABC DAY CARE, LLC ALL SEASONS DAY CARE, LLC      190
## 5 ALL SEASONS ABC DAY CARE, LLC ALL SEASONS DAY CARE, LLC      190
## 6            YESHIVAT OHR HAIIM        YESHIVAT OHR HAIIM    86-06
##           Street Borough ZipCode        Phone Permit.Number
## 1 SPAULDING LANE   BRONX   10471 718-432-1013        104551
## 2     E 162ND ST   BRONX   10451 914-490-5231        104386
## 3     E 162ND ST   BRONX   10451 914-490-5231        104386
## 4     E 162ND ST   BRONX   10451 914-490-5231        104386
## 5     E 162ND ST   BRONX   10451 914-490-5231        104386
## 6   135TH STREET  QUEENS   11418 718-658-7066            NA
##   Permit.Expiration             Status          Age.Range Maximum.Capacity
## 1        2018-09-15 Expired-In Renewal 0 YEARS - 16 YEARS                0
## 2        2020-04-26          Permitted  0 YEARS - 2 YEARS               14
## 3        2020-04-26          Permitted  0 YEARS - 2 YEARS               14
## 4        2020-04-26          Permitted  0 YEARS - 2 YEARS               14
## 5        2020-04-26          Permitted  0 YEARS - 2 YEARS               14
## 6        2115-01-23             Active  3 YEARS - 5 YEARS                0
##   Day.Care.ID   Program.Type Facility.Type               Child.Care.Type
## 1     DC37193   ALL AGE CAMP          Camp                          Camp
## 2     DC35866 INFANT TODDLER           GDC Child Care - Infants/Toddlers
## 3     DC35866 INFANT TODDLER           GDC Child Care - Infants/Toddlers
## 4     DC35866 INFANT TODDLER           GDC Child Care - Infants/Toddlers
## 5     DC35866 INFANT TODDLER           GDC Child Care - Infants/Toddlers
## 6     DC20398      PRESCHOOL          SBCC       School Based Child Care
##   Building.Identification.Number                        URL
## 1                        2090707 www.esfcamps.com/riverdale
## 2                        2002804   www.allseasondaycare.com
## 3                        2002804   www.allseasondaycare.com
## 4                        2002804   www.allseasondaycare.com
## 5                        2002804   www.allseasondaycare.com
## 6                        4206444                       <NA>
##        Date.Permitted Actual Violation.Rate.Percent
## 1 2018-08-07 11:38:39      Y                     NA
## 2 2018-04-26 16:28:50      Y                 0.0000
## 3 2018-04-26 16:28:50      Y                 0.0000
## 4 2018-04-26 16:28:50      Y                 0.0000
## 5 2018-04-26 16:28:50      Y                 0.0000
## 6                <NA>   <NA>                66.6667
##   Average.Violation.Rate.Percent Total.Educational.Workers
## 1                             NA                         0
## 2                             NA                         0
## 3                             NA                         0
## 4                             NA                         0
## 5                             NA                         0
## 6                             NA                         0
##   Average.Total.Educational.Workers Public.Health.Hazard.Violation.Rate
## 1                                 1                                  NA
## 2                                 0                              0.0000
## 3                                 0                              0.0000
## 4                                 0                              0.0000
## 5                                 0                              0.0000
## 6                                 0                             33.3333
##   Average.Public.Health.Hazard.Violation.Rate Critical.Violation.Rate
## 1                                          NA                      NA
## 2                                          NA                  0.0000
## 3                                          NA                  0.0000
## 4                                          NA                  0.0000
## 5                                          NA                  0.0000
## 6                                          NA                 66.6667
##   Average.Critical.Violation.Rate Inspection.Date
## 1                              NA            <NA>
## 2                              NA      2018-12-08
## 3                              NA      2018-12-08
## 4                              NA      2018-08-01
## 5                              NA      2018-05-23
## 6                              NA      2019-01-31
##                                                                                                                         Regulation.Summary
## 1                                                                                                                                     <NA>
## 2                                                              There were no new violations observed at the time of this inspection/visit.
## 3                                                              There were no new violations observed at the time of this inspection/visit.
## 4                                                              There were no new violations observed at the time of this inspection/visit.
## 5                     At time of inspection floors/walls ceilings were observed not maintained; in disrepair or covered in a toxic finish.
## 6 All staff who will have unsupervised contact with children has undergone child abuse and criminal justice screening and reference checks
##   Violation.Category Health.Code.Sub.Section Violation.Status
## 1               <NA>                    <NA>             <NA>
## 2               <NA>                    <NA>              N/A
## 3               <NA>                    <NA>              N/A
## 4               <NA>                    <NA>              N/A
## 5            GENERAL                47.41(j)        CORRECTED
## 6            GENERAL                43.13(a)        CORRECTED
##                                                                                      Inspection.Summary.Result
## 1                                                                                                         <NA>
## 2                                     Monitoring Inspection Non-Routine - Passed inspection with no violations
## 3 Compliance Inspection of Open Violations - Reinspection Required; Violations corrected at time of inspection
## 4                                             Initial Annual Inspection - Passed inspection with no violations
## 5                                                        Initial Annual Inspection - Reinspection Not Required
## 6                                                            Initial Annual Inspection - Reinspection Required

Model1: Zip Code & Public Health Hazard Violation Rate

Model 1 demonstrates an intercept of 28.93108050, which corresponds to those with no zip code have public health hazard violation rate increase by 28.93 units. As zip code increases, there is a -0.0002838 unit decrease in public health hazard violation rates. This data does not appear statistically significant as the p-value is 0.1734.

model1 <- lm(Public.Health.Hazard.Violation.Rate ~ ZipCode, data = qualitycc)
summary(model1)
## 
## Call:
## lm(formula = Public.Health.Hazard.Violation.Rate ~ ZipCode, data = qualitycc)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.093 -25.706  -5.742  13.992  74.314 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.9310850  2.0730054  13.956   <2e-16 ***
## ZipCode     -0.0002838  0.0001910  -1.486    0.137    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.34 on 55630 degrees of freedom
##   (6292 observations deleted due to missingness)
## Multiple R-squared:  3.967e-05,  Adjusted R-squared:  2.169e-05 
## F-statistic: 2.207 on 1 and 55630 DF,  p-value: 0.1374

Model 2: Adding an Independent Variable (Age Range)

According to Model 2, there is a decrease in public health hazard violation rate by -6.66 units for age range 0 - 2 years anda decrease by -1.33 units for age range 2- 3 years. When age range is 3 - 5 years, there is a increase in public health violations by 13.89 units. This may be due to children crossing a significant mental and social growth threshold as they vocalize their feelings and act out more in this age range, thus child care providers may not be sufficiently trained or up-to-date with resources to handle the needs of this age group.

model2 <- lm(Public.Health.Hazard.Violation.Rate ~ ZipCode + Age.Range, data = qualitycc)
summary(model2)
## 
## Call:
## lm(formula = Public.Health.Hazard.Violation.Rate ~ ZipCode + 
##     Age.Range, data = qualitycc)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.899 -21.246  -5.173  13.281  74.611 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 36.3838919  2.2049548  16.501  < 2e-16 ***
## ZipCode                     -0.0008456  0.0001981  -4.269 1.97e-05 ***
## Age.Range0 YEARS - 2 YEARS  -6.6634331  0.5452699 -12.220  < 2e-16 ***
## Age.Range2 YEARS - 5 YEARS  -1.3249315  0.4737039  -2.797  0.00516 ** 
## Age.Range3 YEARS - 5 YEARS  13.8904813  0.6880768  20.187  < 2e-16 ***
## Age.Range6 YEARS - 16 YEARS 25.0793438  1.8750443  13.375  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.19 on 51877 degrees of freedom
##   (10041 observations deleted due to missingness)
## Multiple R-squared:  0.02664,    Adjusted R-squared:  0.02654 
## F-statistic: 283.9 on 5 and 51877 DF,  p-value: < 2.2e-16

Model 3: Interaction between Zip Code & Critical Violation Rate

model3 <- lm(Public.Health.Hazard.Violation.Rate ~ ZipCode*Critical.Violation.Rate + Age.Range + Violation.Status, data = qualitycc)
summary(model3)
## 
## Call:
## lm(formula = Public.Health.Hazard.Violation.Rate ~ ZipCode * 
##     Critical.Violation.Rate + Age.Range + Violation.Status, data = qualitycc)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -75.806 -13.711  -3.338  11.255  96.637 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -1.206e+01  3.786e+00  -3.185  0.00145 ** 
## ZipCode                          2.258e-03  3.456e-04   6.534 6.45e-11 ***
## Critical.Violation.Rate          1.277e+00  7.733e-02  16.511  < 2e-16 ***
## Age.Range0 YEARS - 2 YEARS      -6.932e+00  4.861e-01 -14.259  < 2e-16 ***
## Age.Range2 YEARS - 5 YEARS      -3.184e+00  4.230e-01  -7.527 5.26e-14 ***
## Age.Range3 YEARS - 5 YEARS       1.156e+01  6.141e-01  18.819  < 2e-16 ***
## Age.Range6 YEARS - 16 YEARS      1.917e+01  1.671e+00  11.470  < 2e-16 ***
## Violation.StatusMORE INFO        3.812e-01  6.600e-01   0.578  0.56349    
## Violation.StatusN/A             -5.022e+00  2.267e-01 -22.150  < 2e-16 ***
## Violation.StatusOPEN             2.889e+00  7.399e-01   3.904 9.47e-05 ***
## ZipCode:Critical.Violation.Rate -8.173e-05  7.102e-06 -11.508  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.51 on 51872 degrees of freedom
##   (10041 observations deleted due to missingness)
## Multiple R-squared:  0.2307, Adjusted R-squared:  0.2305 
## F-statistic:  1555 on 10 and 51872 DF,  p-value: < 2.2e-16

Comparing Models

As Model 3 is the most complex of the models and R squared remains relatively low, it is the best fit.

table <- htmlreg(list(model1, model2, model3), doctype=FALSE)
pander(table)
Statistical models
Model 1 Model 2 Model 3
(Intercept) 28.93*** 36.38*** -12.06**
(2.07) (2.20) (3.79)
ZipCode -0.00 -0.00*** 0.00***
(0.00) (0.00) (0.00)
Age.Range0 YEARS - 2 YEARS -6.66*** -6.93***
(0.55) (0.49)
Age.Range2 YEARS - 5 YEARS -1.32** -3.18***
(0.47) (0.42)
Age.Range3 YEARS - 5 YEARS 13.89*** 11.56***
(0.69) (0.61)
Age.Range6 YEARS - 16 YEARS 25.08*** 19.17***
(1.88) (1.67)
Critical.Violation.Rate 1.28***
(0.08)
Violation.StatusMORE INFO 0.38
(0.66)
Violation.StatusN/A -5.02***
(0.23)
Violation.StatusOPEN 2.89***
(0.74)
ZipCode:Critical.Violation.Rate -0.00***
(0.00)
R2 0.00 0.03 0.23
Adj. R2 0.00 0.03 0.23
Num. obs. 55632 51883 51883
RMSE 24.34 24.19 21.51
p < 0.001, p < 0.01, p < 0.05

Graphs

visreg(model1, "ZipCode", scale = "response")

visreg(model2,"ZipCode", by = "Age.Range", scale = "response")