Introduction:

https://data.wa.gov/education/Childcare-Need-Supply-All-/hiqz-y2vv/data_preview

The Childcare Need and Supply Dataset provides information on the demand and supply of childcare services across various geographic locations, age groups, and income brackets. The dataset offers key metrics such as:

Subsidized care, private care estimates, and unserved children. Percent of childcare needs met, segmented by income bracket and age group.

# Load the necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the dataset
childcare_data <- read.csv("/Users/aribarazzaq/Desktop/Childcare_Need___Supply__All__20241012.csv")

# Preview the structure
str(childcare_data)
## 'data.frame':    15632 obs. of  10 variables:
##  $ Geographic.Unit                       : chr  "County" "County" "County" "County" ...
##  $ Geographic.ID                         : int  53001 53001 53001 53001 53001 53001 53001 53001 53001 53001 ...
##  $ Geographic.Name                       : chr  "Adams County" "Adams County" "Adams County" "Adams County" ...
##  $ State.Median.Income.Bracket           : chr  "<=60% of SMI" ">60% and <=75% of SMI" ">75% and <=85% of SMI" ">85% of SMI" ...
##  $ Age.Group                             : chr  "Infant" "Infant" "Infant" "Infant" ...
##  $ Childcare.Subsidized                  : int  18 0 0 0 220 0 0 NA 154 0 ...
##  $ Private.Care.Estimate                 : int  14 0 0 4 23 2 0 11 58 5 ...
##  $ Estimated.Children.Receiving.Childcare: int  32 0 0 4 243 2 0 NA 212 5 ...
##  $ Estimate.of.Unserved                  : int  245 14 8 38 469 60 20 100 2178 219 ...
##  $ Percent.Need.Met                      : num  11.6 0 0 9.5 34.1 3.2 0 NA 8.9 2.2 ...
# Rename columns for better readability
childcare_data <- childcare_data %>%
  rename(
    geographic_unit = Geographic.Unit,
    geographic_id = Geographic.ID,
    geographic_name = Geographic.Name,
    income_bracket = State.Median.Income.Bracket,
    age_group = Age.Group,
    subsidized_care = Childcare.Subsidized,
    private_care = Private.Care.Estimate,
    children_receiving_care = Estimated.Children.Receiving.Childcare,
    unserved_children = Estimate.of.Unserved,
    percent_need_met = Percent.Need.Met
  )
# Check for missing values
sum(is.na(childcare_data))
## [1] 8272
# Drop rows with missing values, if any
childcare_data <- childcare_data %>% drop_na()
# Transform care-related metrics into long format
childcare_long <- childcare_data %>%
  pivot_longer(
    cols = subsidized_care:percent_need_met,
    names_to = "care_type",
    values_to = "value"
  )

# Preview the long format data
head(childcare_long)
## # A tibble: 6 × 7
##   geographic_unit geographic_id geographic_name income_bracket        age_group
##   <chr>                   <int> <chr>           <chr>                 <chr>    
## 1 County                  53001 Adams County    <=60% of SMI          Infant   
## 2 County                  53001 Adams County    <=60% of SMI          Infant   
## 3 County                  53001 Adams County    <=60% of SMI          Infant   
## 4 County                  53001 Adams County    <=60% of SMI          Infant   
## 5 County                  53001 Adams County    <=60% of SMI          Infant   
## 6 County                  53001 Adams County    >60% and <=75% of SMI Infant   
## # ℹ 2 more variables: care_type <chr>, value <dbl>
# Summarize care metrics by age group
age_group_summary <- childcare_long %>%
  group_by(age_group, care_type) %>%
  summarise(avg_value = mean(value, na.rm = TRUE))
## `summarise()` has grouped output by 'age_group'. You can override using the
## `.groups` argument.
print(age_group_summary)
## # A tibble: 20 × 3
## # Groups:   age_group [4]
##    age_group  care_type               avg_value
##    <chr>      <chr>                       <dbl>
##  1 Infant     children_receiving_care     14.5 
##  2 Infant     percent_need_met             8.63
##  3 Infant     private_care                11.0 
##  4 Infant     subsidized_care              3.51
##  5 Infant     unserved_children           92.3 
##  6 Preschool  children_receiving_care     97.1 
##  7 Preschool  percent_need_met            21.2 
##  8 Preschool  private_care                41.5 
##  9 Preschool  subsidized_care             55.6 
## 10 Preschool  unserved_children          184.  
## 11 School Age children_receiving_care     64.3 
## 12 School Age percent_need_met             5.27
## 13 School Age private_care                41.0 
## 14 School Age subsidized_care             23.3 
## 15 School Age unserved_children          585.  
## 16 Toddler    children_receiving_care     38.6 
## 17 Toddler    percent_need_met            12.2 
## 18 Toddler    private_care                22.2 
## 19 Toddler    subsidized_care             16.4 
## 20 Toddler    unserved_children          148.
# Create a bar plot of average percent need met by income bracket
ggplot(childcare_long %>% filter(care_type == "percent_need_met"), 
       aes(x = income_bracket, y = value, fill = income_bracket)) +
  geom_bar(stat = "summary", fun = "mean") +
  labs(title = "Average Percent of Childcare Need Met by Income Bracket",
       x = "Income Bracket", y = "Percent Need Met") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Identify age groups with the most unserved children
unserved_summary <- childcare_data %>%
  group_by(age_group) %>%
  summarise(total_unserved = sum(unserved_children, na.rm = TRUE)) %>%
  arrange(desc(total_unserved))

print(unserved_summary)
## # A tibble: 4 × 2
##   age_group  total_unserved
##   <chr>               <int>
## 1 School Age        1932057
## 2 Preschool          477336
## 3 Toddler            428300
## 4 Infant             269353

Age Groups: The age groups with the highest number of unserved children can be identified, helping policymakers target resources. Income Brackets: There are clear disparities in childcare access, with lower-income families having lower percent needs met. Data Transformation: Pivoting to a long format allowed us to analyze the data more flexibly.