Project 1 - Kindergarten Vaccination Rates in Montomgery County

Maryland Kindergarten Immunization Records 2023-2024

For this project, I am looking at immunization records for Kindergarten students in Montgomery County, Maryland. Maryland for the 2023–2024 school year. The data comes directly from the Maryland Department of Health.

https://health.maryland.gov/phpa/OIDEOR/IMMUN/pages/kindergarten_immunization_rates_by_school.aspx

The dataset includes vaccination rates for DTaP, Polio, MMR, Hep B, and Varicella, along with medical and religious exemption rates across public and private schools.

Research Questions: How do vaccination rates compare between public and private schools in Montgomery County?

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
library(ggplot2)
data <- readxl::read_excel("/Users/manuelbandini/Downloads/DATA 110 Project 1.1.xlsx")
## Data Cleaning
# When the data was loaded, all numeric columns were read as text (character type)
# This means R cannot perform any calculations on them
# Here I am converting all numeric columns to the correct number format using as.numeric()

data <- data |>
  mutate(
    `TOTAL K Students` = as.numeric(`TOTAL K Students`),
    `Total K Students     WITH records` = as.numeric(`Total K Students     WITH records`),
    `Total K Students WITHOUT records` = as.numeric(`Total K Students WITHOUT records`),
    `% Medical Exemption` = as.numeric(`% Medical Exemption`),
    `% Religious Exemption` = as.numeric(`% Religious Exemption`),
    `% DTaP` = as.numeric(`% DTaP`),
    `% Polio` = as.numeric(`% Polio`),
    `% MMR` = as.numeric(`% MMR`),
    `% Hep B` = as.numeric(`% Hep B`),
    `% Varicella` = as.numeric(`% Varicella`)
  )|>
  #Remove rows with mnissing values
  drop_na()
Warning: There were 10 warnings in `mutate()`.
The first warning was:
ℹ In argument: `TOTAL K Students = as.numeric(`TOTAL K Students`)`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 9 remaining warnings.
# Here I am dividing religious exemption rates into three groups
# Low = schools with little to no religious exemptions
# Medium = schools with some religious exemptions  
# High = schools with a lot of religious exemptions

data <- data |>
  mutate(Exemption_Level = case_when(
    `% Religious Exemption` == 0 ~ "Low",
    `% Religious Exemption` <= 0.05 ~ "Medium",
    `% Religious Exemption` > 0.05 ~ "High"
  ))
## Linear Regression
# Here I am running a simple linear regression to see if religious exemption rates
# can predict MMR vaccination rates in Montgomery County schools

model <- lm(`% MMR` ~ `% Religious Exemption`, data = data)
summary(model)

Call:
lm(formula = `% MMR` ~ `% Religious Exemption`, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.58395 -0.00395  0.01453  0.01605  0.01605 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)              0.98395    0.00421 233.717   <2e-16 ***
`% Religious Exemption`  0.09591    0.14608   0.657    0.512    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05006 on 170 degrees of freedom
Multiple R-squared:  0.002529,  Adjusted R-squared:  -0.003338 
F-statistic: 0.4311 on 1 and 170 DF,  p-value: 0.5123
# Diagnostic plots to check the quality of our regression model
plot(model)

## Exploring the Data
# First, I am taking a quick look at the distribution of MMR vaccination rates
# across all Montgomery County schools

ggplot(data, aes(x = `% MMR`)) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# Here I am taking a quick look at how MMR rates differ
# between public and private schools

ggplot(data, aes(x = `Type of School`, y = `% MMR`)) +
  geom_boxplot()

# Here I am taking a quick look at how MMR rates differ
# between public and private schools

ggplot(data, aes(x = `Type of School`, y = `% MMR`)) +
  geom_boxplot()

# Starting with the basic structure of the plot
# I am telling ggplot to use school type on the x axis and MMR rate on the y axis

ggplot(data, aes(x = `Type of School`, y = `% MMR`, fill = `Type of School`)) +
  geom_boxplot()

# Starting with the basic structure of the plot
# I am telling ggplot to use school type on the x axis and MMR rate on the y axis
# scale_fill_manual lets me choose my own colors instead of the default ggplot colors

ggplot(data, aes(x = `Type of School`, y = `% MMR`, fill = `Type of School`)) +
  geom_boxplot() +
  scale_fill_manual(values = c("Public" = "blue", 
                               "Private (non-public)" = "red",
                               "Unknown" = "green"))

# Final polished boxplot comparing MMR vaccination rates
# between public and private schools in Montgomery County
# colored by religious exemption level (Low, Medium, High)

ggplot(data, aes(x = `Type of School`, y = `% MMR`, fill = Exemption_Level)) +
  geom_boxplot() +
  scale_fill_manual(values = c("Low" = "blue", 
                               "Medium" = "orange",
                               "High" = "red")) +
  theme_minimal() +
  labs(
    title = "MMR Vaccination Rates in Montgomery County Schools",
    x = "Type of School",
    y = "MMR Vaccination Rate",
    fill = "Religious Exemption Level",
    caption = "Source: Maryland Department of Health"
  )

When I initially loaded the dataset, it had over 190 rows and all numeric columns were being read as text since the data was pulled as an Excel file, so I converted them to numbers using as.numeric(). I also cleaned the data by removing rows with missing values, which left with 172 rows. Since I was focusing in the religious exemptions, I created a new column and group them by levels. I called them Low, Medium, and High.

What my data showed me was that schools with low religious exemption tend to have higher MMR vaccination rates. Private schools have higher variation than public schools; but overall Montgomery County has a very high MMR vaccination compliance. One thing I wish I could have done was compare us the entire state to see where we stand.