In 2018 our local school district was considering re-purposing several neighborhood elementary school buildings for use as Early Childhood Centers. Close elementary schools, consolidate the student populations, and use the buildings and newly available funding from the State to fund early childhood education (also known as pre-K). Save the district money, and provide more early childhood services to families in need.
Sounds harmless?
Maybe in many suburban environments. But in the walk-able, urban environments already unstable due to poverty, vacant property, and crime, residents were concerned about the impact of losing an elementary school and the stability it brings to a community. An elementary school brings financial stability in attracting young families to buy homes, as well as providing a sense of “place.” The ability of children to walk to school creates the opportunity for a positive, people-centered urban environment similar to the dynamics of living on a university campus (rather than a car-centered environment.) Local history had demonstrated the impact of a large, vacant elementary school building in the heart of a next-door neighborhood. It wasn’t a pretty picture. Since the history of that vacant school building was a prior re-purposing, residents got organized.
The data collected and analyzed here was originally collected by volunteers who conducted a neighborhood survey in order to bring data to a Community Task Force that was formed by Community Stakeholders (Teachers, Principals, District Administrators, School Board, City Council Members, Business Leaders, and Concerned Citizens). The Community Task Force’s recommendations were considered by the Board of Education in deciding how funding from the Prop S Bond (read: voter-approved property tax increase) would be utilized. Since the purpose of the Task Force was to make recommendations to the School Board that the larger community would support in a Bond Issue Vote, the School Board had an interest in following the recommendations whenever possible. This data was submitted to the Community Task Force to inform their recommendations, and the Bond Issue passed in 2019. But what happened to those schools the district wanted to close and re-purpose?
What follows is a review of the data my neighbors and I collected and then submitted to the Task Force in 2018, showcasing some of the skills I have since developed through completing the Google Data Analytics Certificate through Coursera. Since this is a portfolio piece, my goal is to introduce myself and show that I know how to use R and Tableau and what-not. But it would be dishonest to say that I was in any way unbiased about the outcome of this data or the story it tells.
Still, while I had never considered a career in data analytics until August of 2022, I found that I had already done something very similar to a case study in my community organizing activities back in 2018. I am proud of what we accomplished, even if it wasn’t perfectly objective, unbiased data analysis.
The Bond Issue needed to pass, but the initial plan was to close two smaller elementary schools in poor urban neighborhoods in Springfield, including my own. Bowerman Elementary in the Woodland Heights neighborhood, and York Elementary in the Heart of the Westside neighborhood. We needed to convince the Board of Education to invest in these schools rather than sunset them. We were in a bit of a double-bind. We couldn’t afford for the Bond Issue to not pass, that would result in more schools closing. But as proposed, our Zone 1 schools were on the chopping block.
Working with a team of concerned neighbors we expressed our concerns to the Superintendent and representatives of the School Board at a Neighborhood Town Hall meeting held in York Elementary School, organized by our Zone 1 City Councilwoman Phyllis Ferguson. Given the high attendance (standing room only in the gym) and raucous applause at our outspoken protest over the prospect of “re-purposing” the elementary schools, the School Board made the wise move of inviting myself and the Zone 1 Councilwoman to join the Community Task Force.
Our only option was to convince the Community Task Force to invest in the schools, keeping them in the neighborhood, and we needed convincing data to prove support for this plan.
We designed a brief survey in Google forms and made paper copies of the same questions to go door-to-door canvassing with volunteer effort. Our hypothesis was that the neighborhood would not support re-purposing of the school but would support the expansion of early childhood programs at the existing school. We also asked questions about busing since they were relevant to the argument the district was making about closing elementary schools.
In reviewing this data, I want to run deeper analysis into trends and questions that were secondary to our purpose in conducting the original survey, but still interesting to me as I continue to want to understand my community better. These also allow me to practice some new visualization and data cleaning skills.
These questions include:
Was there a connection between support for keeping the school open and completing the survey in-person on paper? Could this indicate bias in the surveyor volunteers? I would also like to visualize the number of surveys conducted in-person vs. online.
How did parents of young children respond differently to the survey?
How many of the respondents actually lived in Woodland Heights? What methods can be used to clean this data efficiently? How can I honor the commitment to not share contact information from the people who were kind enough to participate in the survey?
An example of the paper survey filled out on clipboards at tabled community events and door-to-door solicitations by volunteers from Woodland Heights Neighborhood Association and Freedom City Church’s Hope Homes.
These survey results were then entered manually using the same Google Form so that all results could easily be ported into a Google Sheet for analysis and visualization. Every paper survey was scanned into a .pdf for reference and proof that each survey respondent was in fact a real person. These are currently stored in a Google Drive organized in folders by Street. This data design also helped track what areas of the neighborhood we had surveyed during the data-collection process. Since we state above “Your address will not be shared, only referenced to verify that you live within Woodland Heights neighborhood boundaries” I have cleaned the data in the spreadsheet using the following process in Google Sheets. I will show some names and some addresses here, but never names and addresses together.
During the manual data-entry process, paper survey respondents were noted with an * before the name.
This allows for quick sorting of the paper respondents in the Spreadsheet. Using the auto-fill function, these respondents were given numbers counting down from 9999.
Online respondents were numbered starting from 1001 in ascending order.
Some entries were deleted due to missing data, with some exceptions given to the minority preference of “Early Childhood Center” on question #6. For a detailed list of changes see the Change Log.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
##
## Attaching package: 'lubridate'
##
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
As stated in the admission of bias above, in no way was I or any volunteer who participated in conducting the survey interested in doing anything other than saving our neighborhood elementary school from being closed. We carefully worded the survey so that questions would be as unbiased as possible, but clear. We wanted to provide data to the Community Task Force to verify what was already known: that the neighborhood & families who attended the school did not want the school building to be re-purposed. Even weighting incomplete surveys in favor of the district’s plan to re-purpose (see Change Log), some clear trends were present in the data.
#remove NA values
Only_In_WH_Pref<-Inside_WH[!(is.na(Inside_WH$Preference)),]
#make a bar chart
bar1<-
ggplot(data=Only_In_WH_Pref)+
geom_bar(mapping=aes(x=Preference, fill=Preference))+
labs(title= "Would you prefer to have a K-5 Elementary School or
an Early Childhood Center in the neighborhood?",
subtitle="Woodland Heights Residents only",
caption = "Data collected by volunteers from
Woodland Heights Neighborhood Association
and Freedom City Church")
bar1
Although pie charts are generally frowned upon, in this case, they usefully illustrate the small segment that supported the district’s plan, and the large portion that did not. This was of interest to the Community Task Force in making recommendations to the Board of Education to get a bond issue to pass.
basic_pie<-data.frame(Only_In_WH_Pref %>%
group_by(Preference) %>%
summarize(pref_count=n()))
pct<-round(100*basic_pie$pref_count/sum(basic_pie$pref_count))
pie(basic_pie$pref_count,
labels = paste(basic_pie$Preference, sep =" ", pct, "%"),
col= c("yellow2", "steelblue"),
main ="Would you prefer to have a K-5 Elementary School or
an Early Childhood Center in the neighborhood?")
Adding the surveys that were just outside the neighborhood boundaries didn’t help the district’s case.
#merge data sets
northside_united<-data.frame(
merge(Inside_WH,Outside_WH, all=TRUE))
#transform Not_In_WH "FALSE" to "Woodland Heights Residents" and "TRUE" to "Nearby Neighbors" for easier to read graph
rep_str = c('TRUE' = 'Nearby Neighbors', 'FALSE' = 'Woodland Heights Residents')
northside_united$Not_In_WH <- str_replace_all(northside_united$Not_In_WH, rep_str)
#remove NA values
unite_Pref<-northside_united[!(is.na(northside_united$Preference)),]
#make a stacked bar chart
bar2<-
ggplot(data=unite_Pref)+
geom_bar(mapping=aes(x=Preference, fill=Not_In_WH))+
labs(title= "Would you prefer to have a K-5 Elementary School or
an Early Childhood Center in the neighborhood?",
subtitle="Woodland Heights Residents
and nearbyneighbors",
caption = "Data collected by volunteers from
Woodland Heights Neighborhood Association
and Freedom City Church")
bar2
This was the lynch-pin question in the survey that demonstrated the potential cost of the district’s plan to the ability of the Prop S Bond Issue to pass. The data indicates a clear lack of support for a plan to re-purpose the school. However, we also wanted to provide the Community Task Force with additional information that would be useful when making recommendations to the school board.
How did parents of young children respond differently to the survey?
#CAUTION: Only run ONCE or data will currupt
northside_united$Elementary_Kids_YN<-
ifelse(northside_united$Elementary_Kids_YN=='Yes',1,0)
northside_united$PreK_YN<-
ifelse(northside_united$PreK_YN=='Yes',1,0)
#create Parent Type Labels
northside_united<-northside_united %>%
mutate(parent_type = case_when(
Elementary_Kids_YN == 1 & PreK_YN ==1 ~ 'Parents of Both',
Elementary_Kids_YN == 1 & PreK_YN ==0 ~ 'Elementary Parents',
Elementary_Kids_YN == 0 & PreK_YN ==1 ~ 'PreK Parents',
Elementary_Kids_YN == 0 & PreK_YN ==0 ~ 'Without Young Children'),
.after = PreK_YN
)
#make a stacked bar chart for all neighbors
bar3<-
ggplot(data=northside_united)+
geom_bar(mapping=aes(x=School_Importance_Rating, fill=parent_type))+
labs(title= "How important is the presence of Bowerman
Elementary School to our neighborhood?",
subtitle="Woodland Heights Residents
and nearby neighbors",
caption = "1 = Not Important at all
5= Very Important")
bar3
#Everyone in the neighborhood bar graph
bar4<-
ggplot(data=northside_united,na.rm=TRUE)+
geom_bar(mapping=aes(x=Bus_away_rating, fill=parent_type))+
labs(title= "How would you feel about busing students from
our neighborhood to another elementary school?",
subtitle="Woodland Heights Residents
and nearby neighbors",
caption = "1 = Strongly Dislike
5 = Strongly Favor")
bar4
## Warning: Removed 3 rows containing non-finite values (stat_count).
#Only the parents of elementary and pre-K children
parents_only<-filter(northside_united, parent_type != "Without Young Children")
bar5<-
ggplot(data=parents_only,na.rm=TRUE)+
geom_bar(mapping=aes(x=Bus_away_rating, fill=parent_type, na.rm=TRUE))+
labs(title= "How would you feel about busing students from
our neighborhood to another elementary school?",
subtitle="Woodland Heights Residents
and nearby neighbors",
caption = "1 = Strongly Dislike
5 = Strongly Favor")
## Warning: Ignoring unknown aesthetics: na.rm
bar5
## Warning: Removed 1 rows containing non-finite values (stat_count).
The final busing question refers to the district’s plan to bus pre-K students to elementary schools re-purposed as early childhood centers. Since pre-K services were already available to parents through close, walking-distance elementary schools, this was relevant. While parents of elementary students and those without young children responded in the same pattern, only those parents with pre-K age students were included for these results.
PreK_parents<-filter(northside_united,
parent_type != 'Without Young Children',
parent_type != 'Elementary Parents')%>%
drop_na(Bus_preK_rating)
ggplot(data=PreK_parents,na.rm=TRUE)+
geom_bar(mapping=aes(x=Bus_preK_rating, fill=parent_type))+
labs(title= "How would you feel about allowing your
3 or 4 year old to ride the bus?",
subtitle="Woodland Heights Residents
and nearby neighbors",
caption = "1 = Strongly Dislike
5 = Strongly Favor")
Surprisingly, more of our surveys were conducted on paper and in-person than online. There could be a number of reasons for this including internet access, literacy, or a low interest in finding the link to complete the survey without the presence of a person to ask the questions. More data would need to be collected to answer the question of why more responses came in on paper than online. But the treemap shows the number of paper responses vs. online responses as well as the subgrouping by preference for Early Childhood or Elementary.
#split paper and online results
paper<-
filter(northside_united, ID > 5000) %>%
drop_na(Preference) %>%
drop_na(School_Importance_Rating)
online<-
filter(northside_united, ID < 5000) %>%
drop_na(Preference) %>%
drop_na(School_Importance_Rating)
library(treemap)
#Count by survey type and preference
PapervWeb<-northside_united%>%
mutate(survey_type = case_when(
northside_united$ID > 5000 ~ 'Paper Surveys',
northside_united$ID < 5000 ~ 'Online Surveys')) %>%
drop_na(Preference) %>%
count(survey_type, Preference) %>%
rename("Survey Type"='survey_type', "Total"='n')
#Make a treemap
t1<- treemap(PapervWeb,
index=c("Survey Type", "Preference"),
vSize = "Total",
type = "index",
palette = "Set2",
title= "Paper vs. Online Surveys",
)
Although there were many more paper surveys than online surveys, the shape of each grouping is very similar. The vast majority of people surveyed consider the presence of Bowerman Elementary very important to the neighborhood and would rather have an elementary school than an Early Childhood Center in the neighborhood. The scale of the charts is different, but the shape of the data is the same.
p1<-ggplot(data=paper)+
geom_bar(mapping=aes(x=School_Importance_Rating), fill = 'darkslategray4')+
labs(title= "How important is
Bowerman Elementary School?",
subtitle="Paper Surveys",
caption = "1 = Not Important at all
5= Very Important")
p2<-ggplot(data=paper)+
geom_bar(mapping=aes(x=Preference), fill = 'darkslategray4')+
labs(title= "Preference for pre-K
or Elementary School?",
subtitle="Paper Surveys")
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(p1, p2, nrow=1)
o1<-ggplot(data=online)+
geom_bar(mapping=aes(x=School_Importance_Rating), fill = 'deepskyblue3')+
labs(title= "How important is
Bowerman Elementary School?",
subtitle="Online Surveys",
caption = "1 = Not Important at all
5= Very Important")
o2<-ggplot(data=online)+
geom_bar(mapping=aes(x=Preference), fill = 'deepskyblue3')+
labs(title= "Preference for pre-K
or Elementary School?",
subtitle="Online Surveys")
library(gridExtra)
grid.arrange(o1, o2, nrow=1)
In the end, the clear evidence gleaned from the survey meant that the public would not vote in favor of a tax increase that required the closing of neighborhood schools. By making this message clear to my colleagues on the Community Task Force, we were able to come up with a passing Bond. The Community Task Force recommended keeping both York & Bowerman open. York is currently in the process of being re-built on its original location in the neighborhood. Bowerman has yet to see significant renovation, but remains open in Woodland Heights. The Prop S Bond Issue passed with enthusiastic support from the public.
Completing this project also allowed me to get more familiar with the various capabilities of the tidyverse packages, especially dplyr and ggplot2, as well as some additional packages. Soon, I hope to add to my portfolio some examples of using SQL & Tableau.
My recommendation to anyone who wants a hard-working, quick-learning problem-solver to do some data analysis is that they should hire me.