Introduction:
https://data.wa.gov/education/Childcare-Need-Supply-All-/hiqz-y2vv/data_preview
The Childcare Need and Supply Dataset provides information on the demand and supply of childcare services across various geographic locations, age groups, and income brackets. The dataset offers key metrics such as:
Subsidized care, private care estimates, and unserved children. Percent of childcare needs met, segmented by income bracket and age group.
# Load the necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the dataset
childcare_data <- read.csv("/Users/aribarazzaq/Desktop/Childcare_Need___Supply__All__20241012.csv")
# Preview the structure
str(childcare_data)
## 'data.frame': 15632 obs. of 10 variables:
## $ Geographic.Unit : chr "County" "County" "County" "County" ...
## $ Geographic.ID : int 53001 53001 53001 53001 53001 53001 53001 53001 53001 53001 ...
## $ Geographic.Name : chr "Adams County" "Adams County" "Adams County" "Adams County" ...
## $ State.Median.Income.Bracket : chr "<=60% of SMI" ">60% and <=75% of SMI" ">75% and <=85% of SMI" ">85% of SMI" ...
## $ Age.Group : chr "Infant" "Infant" "Infant" "Infant" ...
## $ Childcare.Subsidized : int 18 0 0 0 220 0 0 NA 154 0 ...
## $ Private.Care.Estimate : int 14 0 0 4 23 2 0 11 58 5 ...
## $ Estimated.Children.Receiving.Childcare: int 32 0 0 4 243 2 0 NA 212 5 ...
## $ Estimate.of.Unserved : int 245 14 8 38 469 60 20 100 2178 219 ...
## $ Percent.Need.Met : num 11.6 0 0 9.5 34.1 3.2 0 NA 8.9 2.2 ...
# Rename columns for better readability
childcare_data <- childcare_data %>%
rename(
geographic_unit = Geographic.Unit,
geographic_id = Geographic.ID,
geographic_name = Geographic.Name,
income_bracket = State.Median.Income.Bracket,
age_group = Age.Group,
subsidized_care = Childcare.Subsidized,
private_care = Private.Care.Estimate,
children_receiving_care = Estimated.Children.Receiving.Childcare,
unserved_children = Estimate.of.Unserved,
percent_need_met = Percent.Need.Met
)
# Check for missing values
sum(is.na(childcare_data))
## [1] 8272
# Drop rows with missing values, if any
childcare_data <- childcare_data %>% drop_na()
# Transform care-related metrics into long format
childcare_long <- childcare_data %>%
pivot_longer(
cols = subsidized_care:percent_need_met,
names_to = "care_type",
values_to = "value"
)
# Preview the long format data
head(childcare_long)
## # A tibble: 6 × 7
## geographic_unit geographic_id geographic_name income_bracket age_group
## <chr> <int> <chr> <chr> <chr>
## 1 County 53001 Adams County <=60% of SMI Infant
## 2 County 53001 Adams County <=60% of SMI Infant
## 3 County 53001 Adams County <=60% of SMI Infant
## 4 County 53001 Adams County <=60% of SMI Infant
## 5 County 53001 Adams County <=60% of SMI Infant
## 6 County 53001 Adams County >60% and <=75% of SMI Infant
## # ℹ 2 more variables: care_type <chr>, value <dbl>
# Summarize care metrics by age group
age_group_summary <- childcare_long %>%
group_by(age_group, care_type) %>%
summarise(avg_value = mean(value, na.rm = TRUE))
## `summarise()` has grouped output by 'age_group'. You can override using the
## `.groups` argument.
print(age_group_summary)
## # A tibble: 20 × 3
## # Groups: age_group [4]
## age_group care_type avg_value
## <chr> <chr> <dbl>
## 1 Infant children_receiving_care 14.5
## 2 Infant percent_need_met 8.63
## 3 Infant private_care 11.0
## 4 Infant subsidized_care 3.51
## 5 Infant unserved_children 92.3
## 6 Preschool children_receiving_care 97.1
## 7 Preschool percent_need_met 21.2
## 8 Preschool private_care 41.5
## 9 Preschool subsidized_care 55.6
## 10 Preschool unserved_children 184.
## 11 School Age children_receiving_care 64.3
## 12 School Age percent_need_met 5.27
## 13 School Age private_care 41.0
## 14 School Age subsidized_care 23.3
## 15 School Age unserved_children 585.
## 16 Toddler children_receiving_care 38.6
## 17 Toddler percent_need_met 12.2
## 18 Toddler private_care 22.2
## 19 Toddler subsidized_care 16.4
## 20 Toddler unserved_children 148.
# Create a bar plot of average percent need met by income bracket
ggplot(childcare_long %>% filter(care_type == "percent_need_met"),
aes(x = income_bracket, y = value, fill = income_bracket)) +
geom_bar(stat = "summary", fun = "mean") +
labs(title = "Average Percent of Childcare Need Met by Income Bracket",
x = "Income Bracket", y = "Percent Need Met") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Identify age groups with the most unserved children
unserved_summary <- childcare_data %>%
group_by(age_group) %>%
summarise(total_unserved = sum(unserved_children, na.rm = TRUE)) %>%
arrange(desc(total_unserved))
print(unserved_summary)
## # A tibble: 4 × 2
## age_group total_unserved
## <chr> <int>
## 1 School Age 1932057
## 2 Preschool 477336
## 3 Toddler 428300
## 4 Infant 269353
Age Groups: The age groups with the highest number of unserved children can be identified, helping policymakers target resources. Income Brackets: There are clear disparities in childcare access, with lower-income families having lower percent needs met. Data Transformation: Pivoting to a long format allowed us to analyze the data more flexibly.