library(tidyverse)
library(leaflet)
library(ggplot2)
library(RColorBrewer)
setwd("C:/Users/Erika/OneDrive/Desktop/DATA 110")
income_poverty <- read_csv("School_Neighborhood_Poverty_Estimates_Current_-8025457622334316475.csv")
data(income_poverty)Income to Poverty Ratio - Project 2
Introduction
This data is from the National Center for Education Statistics (NCES). The data set shows the school neighborhood poverty estimates based on school locations. It uses information from the Common Core of Data (CCD) and income from families with kids ages 5-18 from the Census Bureau. IPR stands for income to poverty ratio and is the percentage of family income that is either above or below the poverty threshold. Those with an IPR of 100(%) means it is at the threshold (the data set also includes the standard error). School ID is the NCES ID that tells which district the school is from. LEA ID is “Local Education Agency Identification” which is a similar form of identification assigned to a school. The variables x and y are longitude and latitude respectively.
To clean up the data I filtered out only the schools in Maryland using the School ID. I chose this data set because I wanted to compare the ratios of the schools specifically in Maryland because I’ve attended some of the schools listed. For my first plot, I used the data to compare schools’ estimated income to poverty ratio nationally to that of Maryland. For my map I plotted the schools in Maryland and provided the name, estimated ratio, and whether they are in the upper or lower half of schools above the poverty threshold. Since there were no schools in Maryland from that data that was below the threshold, I decided to color by the upper and lower half of the schools above the threshold. IPR ranges from 100 to 999 (MD), half is around 550 so I labeled schools on the upper half and lower half of those above the threshold.
Sources for intro:
NCES ID info: https://nces.ed.gov/ccd/quickfacts.asp
LEA ID definition: https://ceds.ed.gov/CEDSElementDetails.aspx?TermId=9153%20
Data set origin and information: https://catalog.data.gov/dataset/school-neighborhood-poverty-estimates-current-eb8bb
First I load everything so I can start
I found the longitude and latitude by searching it and set it as a value for later
MD_lat = 39.0458
MD_lon = -76.6413Filtered for only Maryland schools
I found that the LEA ID is relative to its location. After filtering through the data in R I found that the ID beginning with 24 is Maryland. From this, I filtered through the data set and found where Maryland schools start and end and grabbed only that range.
school_clean <- income_poverty |>
filter(OBJECTID %in% c(39082:40505)) # I used "https://stackoverflow.com/questions/51107901/how-do-i-filter-a-range-of-numbers-in-r" to help me filter a range within my data
head(school_clean)# A tibble: 6 × 11
OBJECTID `School ID` Name IPR_EST IPR_SE LAT LON LEAID `School Year`
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 39082 240002701669 The Seed… 214 111 39.3 -76.7 2400… 2021-2022
2 39083 240003000001 Allegany… 291 56 39.7 -78.8 2400… 2021-2022
3 39084 240003000003 Beall El… 242 31 39.7 -78.9 2400… 2021-2022
4 39085 240003000005 Bel Air … 263 63 39.6 -78.9 2400… 2021-2022
5 39086 240003000006 Braddock… 265 46 39.7 -78.8 2400… 2021-2022
6 39087 240003000007 Westmar … 230 41 39.6 -79.0 2400… 2021-2022
# ℹ 2 more variables: x <dbl>, y <dbl>
Next, I mutated the filtered data to have a new variable giving the level of education
level_school <- school_clean |>
filter(!grepl("Elementary/Middle",Name)) |>
mutate(K12 = case_when(
grepl("Elementary", Name) ~ "Elementary",
grepl("Middle", Name) ~ "Middle",
grepl("High", Name) ~ "High",
TRUE ~ "Others"))I did the same thing but for the data set of all schools
This is only for my first plot
all_school_level <- income_poverty |>
filter(!grepl("Elementary/Middle",Name)) |>
mutate(K12 = case_when(
grepl("Elementary", Name) ~ "Elementary",
grepl("Middle", Name) ~ "Middle",
grepl("High", Name) ~ "High",
TRUE ~ "Others"))Mutated the previous data to have a variable that ditinguishes what is in Maryland and what isn’t
MD_vs_US <- all_school_level |>
mutate(compare = case_when(
OBJECTID %in% c(39082:40505) ~ "Maryland",
TRUE ~ "National"
))Plot 1: Compares the estimated income to poverty ratio of schools nationally to that of Maryland’s
The estimated IPR is the y-value. Since the threshold is at 100% (or 1.0) meaning income is at the poverty level, anything above the dashed line is going to be over 100%. I did not change the values because I wanted to show that it was in fact over 100% meaning that the income (numerator) is greater than the threshold (denominator).
p1 <- MD_vs_US |>
ggplot(aes(compare, IPR_EST, fill = K12)) +
labs(x = "Comparison", y = "Estimated Income to Poverty Ratio",
title = "Maryland vs National Estimated Income to Poverty Ratio (2021-22)",
caption = "Source: National Center for Education Statistics (NCES) \n *Maryland is excluded from 'National' in my comparison") +
geom_boxplot(outlier.shape = NA) + # I used "https://stackoverflow.com/questions/71596549/removing-outliers-from-a-box-plot" to help me remove the outliers
geom_hline (yintercept=100 , col = "grey", linetype = "dashed") + # I used "https://www.rdocumentation.org/packages/ggplot2/versions/0.9.1/topics/geom_vline" for the annotation and "https://www.statology.org/geom_hline-label/" to help me add text to it.
annotate("text", y=80, x=0.8, label="Poverty Threshold: 100(%)", size = 3, color = "#3B3B3B") +
scale_fill_discrete(palette = "Set3", name = "Education Levels") +
theme_bw()
p1Took the Maryland school data and made a new variable that shows the upper and lower halfs of the schools above the poverty threshold (threshold is 100)
No schools in Maryland had and IPR of 100 or below meaning none where below the poverty threshold so I decided to just show which were on the upper and lower half of schools above the threshold.
final_clean <- level_school |> # According to the information given on the website where I got the data, the IPR values range from 0 - 999. If none are below 100 I just found the half way point of 100 to 999 which is around 550. This is how I distinguished upper and lower half.
mutate(threshold = case_when(
IPR_EST >= 550 ~ "Upper",
IPR_EST < 550 ~ "Lower"
))Pop-up tool tip
This shows the school name, IPR estimate, and which half above the poverty threshold it’s in
popupMD <- paste0(
"<b>School: </b>", final_clean$Name, "<br>",
"<b>EST Income to Poverty Ratio: </b>", final_clean$IPR_EST, "<br>",
"<b>Half above poverty level: </b>", final_clean$threshold, "<br>"
)Map: plotted Maryland schools
pal <- colorFactor(palette = c("#76F5C6", "#FF4AA8"),
levels = c("Upper", "Lower"), final_clean$threshold)
leaflet() |>
setView(lng = MD_lon, lat = MD_lat, zoom = 8) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = final_clean,
radius = final_clean$IPR_EST,
color = ~pal(final_clean$threshold),
popup = popupMD)Assuming "LON" and "LAT" are longitude and latitude, respectively
Essay
For my visualization, I compared Maryland and the national estimated income to poverty ratio. I used a box plot to see the quadrants specifically and to show the median. I eliminated outliers because it pushed a lot of the data down. I noticed that Maryland has no schools with an estimated IPR below the poverty threshold but nationally it does. For my map I used the Maryland schools filtered data and plotted them. In the tooltip I included the name, estimated IPR, and whether they are in the upper or lower half of schools above the poverty threshold. Something I wish I could add to the popup is the actual IPR since this was for the school year 2021-22 and information on what the value actually was should be out already. I would’ve liked to compare the two. I also would’ve liked to show which district each school belonged to just as an added location identification. I tried looking for any information on an easy way to convert the school ID to the district but there was none and there were too many schools to do it manually. In my mapped data I saw a lot more pink, which is the lower half above the threshold so that means there are a lot of school neighborhoods that have an income nearing the poverty threshold. More specifically, there are a lot of pinks in the Baltimore area and the edges of the state. Most of the green is in the D.C. and Rockville area.