This analysis explores Prince George’s County, Maryland, to understand how polling precincts are distributed among its municipalities. By identifying which areas have more or fewer polling locations, this project highlights patterns that may relate to community size, geography, or administrative planning. Publicly available data from princegeorgescountymd
# Load required libraries
library(tidyverse)
library(dplyr)
library(ggplot2)
# Set working directory (adjust if needed)
setwd("~/Downloads/25_Semesters/Fall/DATA101")
# Load the dataset
pg_Polling_Data <- read_csv("PollingPlaces_20251015.csv")
# Create a working copy
df <- pg_Polling_Data
Which city has the most voting precincts, and how does their populations proportion influence the overall ballots cast in the county?
# Examine the structure of the dataset
str(df)
## spc_tbl_ [327 × 38] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ the_geom : chr [1:327] "POINT (-76.7848419241464 38.8835887653142)" "POINT (-76.95391582927341 38.983168524022986)" "POINT (-76.84676704090307 39.07103766262892)" "POINT (-76.96620700601962 39.00009352728978)" ...
## $ PRECINCTID: chr [1:327] "003-004" "021-019" "010-008" "017-009" ...
## $ FACILITYID: logi [1:327] NA NA NA NA NA NA ...
## $ NAME : chr [1:327] "PERRYWOOD ELEMENTARY SCHOOL" "ST MARK THE EVANGELIST CHURCH - GYM" "DEERFIELD RUN ELEMENTARY SCHOOL - GYM" "MARY HARRIS MOTHER JONES ELEMENTARY SCHOOL" ...
## $ ADDRESS : chr [1:327] "501 WATKINS PARK DRIVE" "7501 ADELPHI ROAD" "13000 LAUREL-BOWIE ROAD" "2405 TECUMSEH STREET" ...
## $ CITY : chr [1:327] "UPPER MARLBORO" "HYATTSVILLE" "LAUREL" "ADELPHI" ...
## $ STATE : chr [1:327] "MD" "MD" "MD" "MD" ...
## $ ZIP_CODE : num [1:327] 20774 20783 20708 20783 20783 ...
## $ TELEPHONE : logi [1:327] NA NA NA NA NA NA ...
## $ TYPE : logi [1:327] NA NA NA NA NA NA ...
## $ FULLADDR : chr [1:327] "501 WATKINS PARK DRIVE" "7501 ADELPHI ROAD" "13000 LAUREL-BOWIE ROAD" "2405 TECUMSEH STREET" ...
## $ MUNICIPALI: chr [1:327] "UPPER MARLBORO" "HYATTSVILLE" "LAUREL" "ADELPHI" ...
## $ AGENCYURL : chr [1:327] "https://www.princegeorgescountymd.gov/559/Board-of-Elections" "https://www.princegeorgescountymd.gov/559/Board-of-Elections" "https://www.princegeorgescountymd.gov/559/Board-of-Elections" "https://www.princegeorgescountymd.gov/559/Board-of-Elections" ...
## $ OPERDAYS : logi [1:327] NA NA NA NA NA NA ...
## $ HANDICAP : logi [1:327] NA NA NA NA NA NA ...
## $ POCNAME : chr [1:327] "Prince George's County Board of Elections" "Prince George's County Board of Elections" "Prince George's County Board of Elections" "Prince George's County Board of Elections" ...
## $ POCPHONE : chr [1:327] "301-341-7300" "301-341-7300" "301-341-7300" "301-341-7300" ...
## $ POCEMAIL : chr [1:327] "election@co.pg.md.us" "election@co.pg.md.us" "election@co.pg.md.us" "election@co.pg.md.us" ...
## $ EARLYVOTIN: logi [1:327] NA NA NA NA NA NA ...
## $ EARLYVOT_1: chr [1:327] "2020 Apr 16 12:00:00 AM" "2020 Apr 16 12:00:00 AM" "2020 Apr 16 12:00:00 AM" "2020 Apr 16 12:00:00 AM" ...
## $ NEXTELECT : chr [1:327] "2020 Apr 28 12:00:00 AM" "2020 Apr 28 12:00:00 AM" "2020 Apr 28 12:00:00 AM" "2020 Apr 28 12:00:00 AM" ...
## $ REGDATE : chr [1:327] "2020 Apr 07 12:00:00 AM" "2020 Apr 07 12:00:00 AM" "2020 Apr 07 12:00:00 AM" "2020 Apr 07 12:00:00 AM" ...
## $ VOTESERVIC: logi [1:327] NA NA NA NA NA NA ...
## $ DROPBOX : chr [1:327] "No" "No" "No" "No" ...
## $ BOXLOCATIO: logi [1:327] NA NA NA NA NA NA ...
## $ COMMENTS : logi [1:327] NA NA NA NA NA NA ...
## $ CONGRESS : num [1:327] 5 4 4 4 4 4 4 5 4 4 ...
## $ LEGIS : chr [1:327] "25" "22" "23" "21" ...
## $ COUNCIL : num [1:327] 6 3 1 2 2 2 3 9 8 9 ...
## $ SCHOOL : num [1:327] 7 3 1 3 3 3 2 9 8 8 ...
## $ ROUTINGREG: logi [1:327] NA NA NA NA NA NA ...
## $ RCN_NAME : chr [1:327] "BOARD OF ELECTIONS (LARGO 95)" "POLICE DISTRICT VI STATION (BELTSVILLE)" "POLICE DISTRICT VI STATION (BELTSVILLE)" "POLICE DISTRICT VI STATION (BELTSVILLE)" ...
## $ GLOBALID : logi [1:327] NA NA NA NA NA NA ...
## $ CREATIONDA: logi [1:327] NA NA NA NA NA NA ...
## $ CREATOR : logi [1:327] NA NA NA NA NA NA ...
## $ EDITDATE : logi [1:327] NA NA NA NA NA NA ...
## $ EDITOR : logi [1:327] NA NA NA NA NA NA ...
## $ IMPRT_DATE: chr [1:327] "2022 Nov 09 12:00:00 AM" "2022 Nov 09 12:00:00 AM" "2022 Nov 09 12:00:00 AM" "2022 Nov 09 12:00:00 AM" ...
## - attr(*, "spec")=
## .. cols(
## .. the_geom = col_character(),
## .. PRECINCTID = col_character(),
## .. FACILITYID = col_logical(),
## .. NAME = col_character(),
## .. ADDRESS = col_character(),
## .. CITY = col_character(),
## .. STATE = col_character(),
## .. ZIP_CODE = col_double(),
## .. TELEPHONE = col_logical(),
## .. TYPE = col_logical(),
## .. FULLADDR = col_character(),
## .. MUNICIPALI = col_character(),
## .. AGENCYURL = col_character(),
## .. OPERDAYS = col_logical(),
## .. HANDICAP = col_logical(),
## .. POCNAME = col_character(),
## .. POCPHONE = col_character(),
## .. POCEMAIL = col_character(),
## .. EARLYVOTIN = col_logical(),
## .. EARLYVOT_1 = col_character(),
## .. NEXTELECT = col_character(),
## .. REGDATE = col_character(),
## .. VOTESERVIC = col_logical(),
## .. DROPBOX = col_character(),
## .. BOXLOCATIO = col_logical(),
## .. COMMENTS = col_logical(),
## .. CONGRESS = col_double(),
## .. LEGIS = col_character(),
## .. COUNCIL = col_double(),
## .. SCHOOL = col_double(),
## .. ROUTINGREG = col_logical(),
## .. RCN_NAME = col_character(),
## .. GLOBALID = col_logical(),
## .. CREATIONDA = col_logical(),
## .. CREATOR = col_logical(),
## .. EDITDATE = col_logical(),
## .. EDITOR = col_logical(),
## .. IMPRT_DATE = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
# Display the first and last few rows
head(df, 5)
## # A tibble: 5 × 38
## the_geom PRECINCTID FACILITYID NAME ADDRESS CITY STATE ZIP_CODE TELEPHONE
## <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <dbl> <lgl>
## 1 POINT (-76… 003-004 NA PERR… 501 WA… UPPE… MD 20774 NA
## 2 POINT (-76… 021-019 NA ST M… 7501 A… HYAT… MD 20783 NA
## 3 POINT (-76… 010-008 NA DEER… 13000 … LAUR… MD 20708 NA
## 4 POINT (-76… 017-009 NA MARY… 2405 T… ADEL… MD 20783 NA
## 5 POINT (-76… 017-011 NA RIDG… 6120 R… HYAT… MD 20783 NA
## # ℹ 29 more variables: TYPE <lgl>, FULLADDR <chr>, MUNICIPALI <chr>,
## # AGENCYURL <chr>, OPERDAYS <lgl>, HANDICAP <lgl>, POCNAME <chr>,
## # POCPHONE <chr>, POCEMAIL <chr>, EARLYVOTIN <lgl>, EARLYVOT_1 <chr>,
## # NEXTELECT <chr>, REGDATE <chr>, VOTESERVIC <lgl>, DROPBOX <chr>,
## # BOXLOCATIO <lgl>, COMMENTS <lgl>, CONGRESS <dbl>, LEGIS <chr>,
## # COUNCIL <dbl>, SCHOOL <dbl>, ROUTINGREG <lgl>, RCN_NAME <chr>,
## # GLOBALID <lgl>, CREATIONDA <lgl>, CREATOR <lgl>, EDITDATE <lgl>, …
tail(df, 5)
## # A tibble: 5 × 38
## the_geom PRECINCTID FACILITYID NAME ADDRESS CITY STATE ZIP_CODE TELEPHONE
## <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <dbl> <lgl>
## 1 POINT (-76… 020-024 NA GLEN… 7801 G… GLEN… MD 20706 NA
## 2 POINT (-76… 020-025 NA LINC… 9800 R… LANH… MD 20706 NA
## 3 POINT (-76… 021-020 NA ST M… 7501 A… HYAT… MD 20783 NA
## 4 POINT (-76… 021-021 NA HILL… 2601 P… ADEL… MD 20783 NA
## 5 POINT (-76… 021-022 NA HILL… 2601 P… ADEL… MD 20783 NA
## # ℹ 29 more variables: TYPE <lgl>, FULLADDR <chr>, MUNICIPALI <chr>,
## # AGENCYURL <chr>, OPERDAYS <lgl>, HANDICAP <lgl>, POCNAME <chr>,
## # POCPHONE <chr>, POCEMAIL <chr>, EARLYVOTIN <lgl>, EARLYVOT_1 <chr>,
## # NEXTELECT <chr>, REGDATE <chr>, VOTESERVIC <lgl>, DROPBOX <chr>,
## # BOXLOCATIO <lgl>, COMMENTS <lgl>, CONGRESS <dbl>, LEGIS <chr>,
## # COUNCIL <dbl>, SCHOOL <dbl>, ROUTINGREG <lgl>, RCN_NAME <chr>,
## # GLOBALID <lgl>, CREATIONDA <lgl>, CREATOR <lgl>, EDITDATE <lgl>, …
# Display column names
names(df)
## [1] "the_geom" "PRECINCTID" "FACILITYID" "NAME" "ADDRESS"
## [6] "CITY" "STATE" "ZIP_CODE" "TELEPHONE" "TYPE"
## [11] "FULLADDR" "MUNICIPALI" "AGENCYURL" "OPERDAYS" "HANDICAP"
## [16] "POCNAME" "POCPHONE" "POCEMAIL" "EARLYVOTIN" "EARLYVOT_1"
## [21] "NEXTELECT" "REGDATE" "VOTESERVIC" "DROPBOX" "BOXLOCATIO"
## [26] "COMMENTS" "CONGRESS" "LEGIS" "COUNCIL" "SCHOOL"
## [31] "ROUTINGREG" "RCN_NAME" "GLOBALID" "CREATIONDA" "CREATOR"
## [36] "EDITDATE" "EDITOR" "IMPRT_DATE"
The fields that I will be focusing on will be the ZipCode in order to identify the areas or regions for each polling area,CITY and MUNICIPALI for grouping each of the municipality or communities together
# Select relevant columns for analysis
df2 <- df |>
select(CITY, ZIP_CODE, MUNICIPALI)
# Check for missing values in key fields
colSums(is.na(df2[, c("CITY", "MUNICIPALI")]))
## CITY MUNICIPALI
## 1 1
The agenda of this project is to implement the concepts of exploratory data analysis (EDA) and limiting assumptions. The analysis will champion the most engaging community in the county while providing insights about the cities that make up Prince Georges county Maryland by analyzing the trends discovered through the EDA. For the visual aspects on the EDA I will be relying on charts such as bubble charts, bar charts, box-plot, and histogram. These insights are valuable for data-driven civic planning, as they support informed decision-making in areas such as electoral infrastructure, voter accessibility, and community engagement.
# Filter rows with non-missing MUNICIPALI values
unique_municipali <- df2 |>
filter(!is.na(MUNICIPALI)) |>
select(MUNICIPALI)
# Convert to a factor with alphabetical levels
df2_cities <- unique_municipali |>
mutate(MUNICIPALI = factor(MUNICIPALI, levels = sort(unique(MUNICIPALI))))
# Summarize precinct distribution across municipalities
summary(df2_cities$MUNICIPALI)
## ACCOKEEK ADELPHI BELTSVILLE BERWYN HEIGHTS
## 5 8 6 1
## BLADENSBURG BOWIE BRANDYWINE BRENTWOOD
## 4 35 4 1
## CAMP SPRINGS CAPITOL HEIGHTS CHEVERLY CLINTON
## 1 10 3 13
## COLLEGE PARK COLMAR MANOR DISTRICT HEIGHTS EDMONSTON
## 6 1 5 1
## FORESTVILLE FORT WASHINGTON GLENARDEN GLENN DALE
## 7 16 4 9
## GLENNDALE GREENBELT HILLCREST HEIGHTS HYATTSVILLE
## 1 6 1 18
## LANDOVER LANDOVER HILLS LANHAM LARGO
## 16 3 13 1
## LAUREL MARLOW HEIGHTS MITCHELLVILLE MOUNT RAINIER
## 14 1 8 1
## MT RAINIER NEW CARROLLTON OXON HILL RIVERDALE
## 1 7 9 6
## SEABROOK SEAT PLEASANT SILVER SPRING SUITLAND
## 2 8 1 6
## TAKOMA PARK TEMPLE HILLS UPPER MALRBORO UPPER MARLBORO
## 4 16 2 41
# Manual precinct data per municipality
precints <- c(
ACCOKEEK = 5, ADELPHI = 8, BELTSVILLE = 6,
`BERWYN HEIGHTS` = 1, BLADENSBURG = 4, BOWIE = 35,
BRANDYWINE = 4, BRENTWOOD = 1, `CAMP SPRINGS` = 1,
`CAPITOL HEIGHTS` = 10, CHEVERLY = 3, CLINTON = 13,
`COLLEGE PARK` = 6, `COLMAR MANOR` = 1, `DISTRICT HEIGHTS` = 5,
EDMONSTON = 1, FORESTVILLE = 7, `FORT WASHINGTON` = 16,
GLENARDEN = 4, `GLENN DALE` = 9, GLENNDALE = 1,
GREENBELT = 6, `HILLCREST HEIGHTS` = 1, HYATTSVILLE = 18,
LANDOVER = 16, `LANDOVER HILLS` = 3, LANHAM = 13,
LARGO = 1, LAUREL = 14, `MARLOW HEIGHTS` = 1,
MITCHELLVILLE = 8, `MOUNT RAINIER` = 1, `MT RAINIER` = 1,
`NEW CARROLLTON` = 7, `OXON HILL` = 9, RIVERDALE = 6,
SEABROOK = 2, `SEAT PLEASANT` = 8, `SILVER SPRING` = 1,
SUITLAND = 6, `TAKOMA PARK` = 4, `TEMPLE HILLS` = 16,
`UPPER MALRBORO` = 2, `UPPER MARLBORO` = 41
)
# Convert to a dataframe
df_precint <- data.frame(
MUNICIPALI = names(precints),
num_of_precint = as.numeric(precints)
)
# Preview precinct dataset
head(df_precint)
## MUNICIPALI num_of_precint
## 1 ACCOKEEK 5
## 2 ADELPHI 8
## 3 BELTSVILLE 6
## 4 BERWYN HEIGHTS 1
## 5 BLADENSBURG 4
## 6 BOWIE 35
tail(df_precint)
## MUNICIPALI num_of_precint
## 39 SILVER SPRING 1
## 40 SUITLAND 6
## 41 TAKOMA PARK 4
## 42 TEMPLE HILLS 16
## 43 UPPER MALRBORO 2
## 44 UPPER MARLBORO 41
# Sort municipalities by number of precincts
df_precint |> arrange(num_of_precint)
## MUNICIPALI num_of_precint
## 1 BERWYN HEIGHTS 1
## 2 BRENTWOOD 1
## 3 CAMP SPRINGS 1
## 4 COLMAR MANOR 1
## 5 EDMONSTON 1
## 6 GLENNDALE 1
## 7 HILLCREST HEIGHTS 1
## 8 LARGO 1
## 9 MARLOW HEIGHTS 1
## 10 MOUNT RAINIER 1
## 11 MT RAINIER 1
## 12 SILVER SPRING 1
## 13 SEABROOK 2
## 14 UPPER MALRBORO 2
## 15 CHEVERLY 3
## 16 LANDOVER HILLS 3
## 17 BLADENSBURG 4
## 18 BRANDYWINE 4
## 19 GLENARDEN 4
## 20 TAKOMA PARK 4
## 21 ACCOKEEK 5
## 22 DISTRICT HEIGHTS 5
## 23 BELTSVILLE 6
## 24 COLLEGE PARK 6
## 25 GREENBELT 6
## 26 RIVERDALE 6
## 27 SUITLAND 6
## 28 FORESTVILLE 7
## 29 NEW CARROLLTON 7
## 30 ADELPHI 8
## 31 MITCHELLVILLE 8
## 32 SEAT PLEASANT 8
## 33 GLENN DALE 9
## 34 OXON HILL 9
## 35 CAPITOL HEIGHTS 10
## 36 CLINTON 13
## 37 LANHAM 13
## 38 LAUREL 14
## 39 FORT WASHINGTON 16
## 40 LANDOVER 16
## 41 TEMPLE HILLS 16
## 42 HYATTSVILLE 18
## 43 BOWIE 35
## 44 UPPER MARLBORO 41
# Calculate summary statistics
summary_stats <- df_precint |>
summarise(
Mean_num_of_precint = round(mean(num_of_precint, na.rm = TRUE), 0),
Median_num_of_precint = round(median(num_of_precint, na.rm = TRUE), 0),
SD_num_of_precint = round(sd(num_of_precint, na.rm = TRUE), 0),
Min_num_of_precint = min(num_of_precint, na.rm = TRUE),
Max_num_of_precint = max(num_of_precint, na.rm = TRUE)
)
summary_stats
## Mean_num_of_precint Median_num_of_precint SD_num_of_precint
## 1 7 6 8
## Min_num_of_precint Max_num_of_precint
## 1 1 41
stats_summary <- data.frame(
Mean = summary_stats$Mean_num_of_precint,median = summary_stats$Median_num_of_precint,
SD = summary_stats$SD_num_of_precint,
Min = summary_stats$Min_num_of_precint,
Max = summary_stats$Max_num_of_precint
)
stats_summary
## Mean median SD Min Max
## 1 7 6 8 1 41
The analysis reveals a wide variation in the number of precincts across municipalities. Upper Marlboro, as the county seat, has the highest number of precincts (43), while several smaller towns, including Brentwood, Edmonston, and Colmar Manor, each have only one. This highlights that polling locations are distributed unevenly, reflecting the differences in population size and significance. These results suggest that smaller towns may share polling locations, while population growth and higher demand for voting access are concentrated in major centers like Bowie, Laurel, and Upper Marlboro
# Identify municipalities below the average number of precincts
mean_precinct <- summary_stats$Mean_num_of_precint
cities_below_average <- df_precint |>
filter(num_of_precint < mean_precinct)
# Display count and data
nrow(cities_below_average)
## [1] 27
cities_below_average
## MUNICIPALI num_of_precint
## 1 ACCOKEEK 5
## 2 BELTSVILLE 6
## 3 BERWYN HEIGHTS 1
## 4 BLADENSBURG 4
## 5 BRANDYWINE 4
## 6 BRENTWOOD 1
## 7 CAMP SPRINGS 1
## 8 CHEVERLY 3
## 9 COLLEGE PARK 6
## 10 COLMAR MANOR 1
## 11 DISTRICT HEIGHTS 5
## 12 EDMONSTON 1
## 13 GLENARDEN 4
## 14 GLENNDALE 1
## 15 GREENBELT 6
## 16 HILLCREST HEIGHTS 1
## 17 LANDOVER HILLS 3
## 18 LARGO 1
## 19 MARLOW HEIGHTS 1
## 20 MOUNT RAINIER 1
## 21 MT RAINIER 1
## 22 RIVERDALE 6
## 23 SEABROOK 2
## 24 SILVER SPRING 1
## 25 SUITLAND 6
## 26 TAKOMA PARK 4
## 27 UPPER MALRBORO 2
# Bar Chart — Cities Below Average
ggplot(cities_below_average, aes(x = MUNICIPALI, y =
num_of_precint)) +
geom_col(fill = "#1f77b4", color = "black") +
labs(
title = "Municipalities Below Average Number of Precincts",
x = "Municipality",
y = "Number of Precincts"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Bubble Chart of All Cities
ggplot(df_precint, aes(
x = MUNICIPALI,
y = num_of_precint,
size = num_of_precint,
color = MUNICIPALI
)) +
geom_point(alpha = 0.7) +
scale_size_continuous(range = c(3, 15)) +
labs(
title = "Bubble Chart: Precincts by Municipality",
x = "Municipality",
y = "Number of Precincts",
size = "Precinct Count"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none"
)
# Group by number of precincts and count of how many municipalities share that number
df_precint_summary <- df_precint |>
group_by(num_of_precint) |>
summarise(count = n())
df_precint_summary
## # A tibble: 16 × 2
## num_of_precint count
## <dbl> <int>
## 1 1 12
## 2 2 2
## 3 3 2
## 4 4 4
## 5 5 2
## 6 6 5
## 7 7 2
## 8 8 3
## 9 9 2
## 10 10 1
## 11 13 2
## 12 14 1
## 13 16 3
## 14 18 1
## 15 35 1
## 16 41 1
# Histogram chart plotting the distribution of precincts per city
ggplot(df_precint_summary, aes(x = num_of_precint, y = count)) +
geom_col(fill = "#1f77b4", color = "black") +
labs(
title = "Distribution of Number of Precincts per Municipality",
x = "Number of Precincts",
y = "Count of Municipalities"
) +
theme_minimal()
At the beginning of this project, my initial goal was to determine voter turnout in the Prince George’s County area. However, as I progressed through the exploratory data analysis process, I realized that the dataset I obtained from publicly available sources did not contain the information needed to answer that question directly. Despite this limitation, my analysis revealed interesting insights into the distribution of voting precincts within the county. I found that the mean number of precincts was 7, the median was 6, the standard deviation was 8, the minimum was 1, and the maximum was 41. The relatively high standard deviation supports the observation that the distribution of precincts is skewed. Although my focus shifted from voter turnout to precinct distribution, this analysis still provided valuable insights into the uneven organization of voting precincts across Prince George’s County.
cat("### Key Insights:
* The **average number of precincts** across municipalities is", mean_precinct, ".
* There are", nrow(cities_below_average), "cities below this average.
* The municipality with the **most precincts** is UPPER MARLBORO (41).
* The distribution of precincts is positively skewed, indicating a few large cities dominate the total count.
")
## ### Key Insights:
##
## * The **average number of precincts** across municipalities is 7 .
##
## * There are 27 cities below this average.
##
## * The municipality with the **most precincts** is UPPER MARLBORO (41).
##
## * The distribution of precincts is positively skewed, indicating a few large cities dominate the total count.