According to Kaggle.com, this dataset consists of the Average SAT Scores for NYC Public Schools. The Data are organized by 22 columns. The column names are as follows: department ID number, school name, borough, building code, street address, latitude/longitude coordinates, phone number, start and end times, student enrollment with race breakdown, and average scores on each SAT test section for the 2014-2015 school year. There are 435 rows and each row signifies every accredited high school in New York City. However, not all accredited schools have recorded SAT scores. In fact, there are 60 schools without reported data. For this project, I will remove the schools without SAT scores.
This dataset was not a sample. In 2015, the New York City Department of Education compiled and published the high school data with the assistance of the College Board SAT score averages and testing rates. Potential issues include a lack of file descriptions and column descriptions. Additionally, the update frequency of this data set is not specified.
I am interested in exploring and visualizing the range of Math SAT scores in New York City. I am curious to know if the best SAT scores are concentrated in a specific borough or area? I am also curious to know how many schools are above the SAT math national average. Finally, I am also curious about the demographic of the schools with the highest and lowest SAT score in each borough.
library(leaflet)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
getwd()
## [1] "/Users/blaisesevier/Desktop/DS 4001"
nycscores <- read.csv("scoresnyc.csv", sep= ",") #This is the original dataset
any(is.na(nycscores[])) #checking for N/As --> TRUE
## [1] TRUE
sum(is.na(nycscores[])) #number of N/As --> 187//60 rows
## [1] 187
colSums(is.na(nycscores[])) #determining which columns
## School.ID School.Name
## 0 0
## Borough Building.Code
## 0 0
## Street.Address City
## 0 0
## State Zip.Code
## 0 0
## Latitude Longitude
## 0 0
## Phone.Number Start.Time
## 0 0
## End.Time Student.Enrollment
## 0 7
## Percent.White Percent.Black
## 0 0
## Percent.Hispanic Percent.Asian
## 0 0
## Average.Score..SAT.Math. Average.Score..SAT.Reading.
## 60 60
## Average.Score..SAT.Writing. Percent.Tested
## 60 0
clean_nycscores <- na.omit(nycscores) #removing all na's from the data set.
nrow(clean_nycscores) #There are 375 rows now in this data set.
## [1] 375
condensed_nyc <- clean_nycscores[ , c(2,3,9,10, 19,20,21)] #Condensed rows to School name, Borough, Laditude, Longitude, Average SAT Scores (Math, Reading, Writing).
For the purposes of this project, I am going to focus on one SAT subject area, math. Although it would be beneficial to study all three of the subject areas, I feel that it is important to focus on one subject area to ensure a thorough analysis.
In 2015, a record 1.70 million students from the class of 2015 took the SAT. According to collegeboard.org, the math SAT national average was 511.
Below, you will see a variety of data exploration models. For each borough you will see a summary model that explains the condensed_nyc’s Average Score SAT Math minimum, 1st Quartile, Median, Mean, 3rd Quartile, and Max.
Before every borough analysis, I will conduct this summary model to give me an idea of the spread of the data. Then, you will see a map that will visualize each school’s name, location and math SAT score.The SAT score will help me determine the various color icons.
An icon will be determined to be red if the school’s math SAT score is below the populations 1st Quartile of the dataset. An icon will be determined orange if the school’s score is lower than the national average (511). An icon will be determined green if the school’s SAT score is equal to or higher than the national average score.
Before looking into each borough, I will start by looking at the entire dataset to get an idea of the spread of the math SAT scores, and the general marker patterns.
Disclaimer, I do not believe that test scores are a complete representation of a school’s success. Judging the success of a school solely on an SAT eliminates so many other markers. Since this is just a surface level analysis, I believe that looking at a school’s average math SAT is a way to break into learning more about a school.
summary(condensed_nyc$Average.Score..SAT.Math.) #Total Average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 317.0 386.0 415.0 432.9 458.5 754.0
#Min. 1st Qu. Median Mean 3rd Qu. Max.
#317.0 386.0 415.0 432.9 458.5 754.0
getColor_set <- function(condensed_nyc) {
sapply(condensed_nyc$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
if(Average.Score..SAT.Math. <= 386 ) {
"red"
} else if( Average.Score..SAT.Math.<= 511) {
"orange"
} else {
"green"
} })
}
icons_set <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = getColor_set(condensed_nyc)
)
leaflet(condensed_nyc) %>% addTiles() %>%
addAwesomeMarkers(~Longitude, ~Latitude, icon=icons_set, popup = ~as.character(condensed_nyc$School.Name), label=~as.character(Average.Score..SAT.Math.))
The unfortunate reality of this map is the enormous amount of red and yellow markers. Out of the 375 accredited high schools, only 45 or 12% were above the national average. This means that 88% of NYC accredited high schools were below the national average for the 2015 math SAT.
If you zoom in on the map of NYC, you are able to see some clusters of green, yellow and red markers.
nrow(condensed_nyc[which(condensed_nyc$Average.Score..SAT.Math. >= 511),]) # 45 schools are above the National Average
## [1] 45
This data chunk, is the first step to understanding and comparing the various SAT scores between each borough. However, it is important to note the change in reported schools after I removed the various N/As in the dataset.
The breakdown is as follows: 20 (17%) schools were removed from the Bronx borough. 12 (9.9%) schools were removed from the Brooklyn borough. 17 (16%) schools were removed from the Manhattan borough. 11 (13%) schools were removed from the Queens neighborhood. There were no schools removed from the Staten Island borough.
Given that there were varying amounts of schools removed per neighborhood, it is important to factor this missing information into the impending analysis. From this lack of data, one can ask why 60 accredited schools in NYC did not report their data?
summary(nycscores$Borough) #BEFORE THE CLEANED DATA
## Bronx Brooklyn Manhattan Queens Staten Island
## 118 121 106 80 10
#Bronx Brooklyn Manhattan Queens Staten Island
#118 121 106 80 10
summary(condensed_nyc$Borough) #AFTER THE CLEANED DATA
## Bronx Brooklyn Manhattan Queens Staten Island
## 98 109 89 69 10
#Bronx Brooklyn Manhattan Queens Staten Island
#98 109 89 69 10
newdata_brooklyn <- condensed_nyc[ which(condensed_nyc$Borough =='Brooklyn'),] #separation for future visuals
newdata_manhattan <-condensed_nyc[which(condensed_nyc$Borough == "Manhattan"),]
newdata_bronx <-condensed_nyc[which(condensed_nyc$Borough == "Bronx"),]
newdata_StatenIsland <-condensed_nyc[which(condensed_nyc$Borough == "Staten Island"),]
newdata_queens <- condensed_nyc[which(condensed_nyc$Borough == "Queens"),]
library(ggplot2)
The National Average for the 2015 SAT Score for Math, Reading and Writing was 511, 495, and 484 respectively. Below you will find maps with markers for each accredited high school in Brooklyn, Queens, Manhattan, Staten Island, and Bronx. Before the visual, I have provided a summary of the minimum, median, mean, and max scores.
Additionally, below is a boxplot graph that shows the summary data for the NYC SAT Math score divided by borough. As you can see, Staten Island has the highest median math SAT score (465.5) and the Bronx has the lowest SAT score (395.5). However the number of schools recorded in each school is borough is vastly different. This makes it very difficult to compare the boroughs.
bp <- ggplot(clean_nycscores, aes(x=Borough, y =Average.Score..SAT.Math., group = Borough)) + geom_boxplot(aes(fill = Borough)) + labs(title = "Average SAT Math Scores", x = "Borough", y = "MATH SAT SCORE")
bp #This is a boxplot that graphs the Average SAT Math Scores and compares the median by Borough
med_bronx <- median(newdata_bronx$Average.Score..SAT.Math.) #set up for median and mean comparison tables
mean_bronx <-mean(newdata_bronx$Average.Score..SAT.Math.)
med_SI <- median(newdata_StatenIsland$Average.Score..SAT.Math.)
mean_SI <-mean(newdata_StatenIsland$Average.Score..SAT.Math.)
med_queens <-median(newdata_queens$Average.Score..SAT.Math.)
mean_queens<-mean(newdata_queens$Average.Score..SAT.Math.)
med_brook <- median(newdata_brooklyn$Average.Score..SAT.Math.)
mean_brook <- mean(newdata_brooklyn$Average.Score..SAT.Math.)
med_man <- median(newdata_manhattan$Average.Score..SAT.Math.)
mean_man <- mean(newdata_manhattan$Average.Score..SAT.Math.)
cbind("Bronx", "Staten Island", "Queens", "Brooklyn", "Manhattan")
## [,1] [,2] [,3] [,4] [,5]
## [1,] "Bronx" "Staten Island" "Queens" "Brooklyn" "Manhattan"
mean_median_df <- data.frame("Borough" = rbind("Bronx", "Staten Island", "Queens", "Brooklyn", "Manhattan"), "Median" = rbind(med_bronx, med_SI, med_queens, med_brook, med_man), "Mean" = rbind(mean_bronx, mean_SI, mean_queens, mean_brook, mean_man))
mean_median_df#table that shows the Median/Mean of all the boroughs
## Borough Median Mean
## med_bronx Bronx 395.5 404.3571
## med_SI Staten Island 465.5 486.2000
## med_queens Queens 448.0 462.3623
## med_brook Brooklyn 395.0 416.4037
## med_man Manhattan 433.0 455.8876
mean_graph <- ggplot(mean_median_df, aes(x=Borough, y = Mean)) + geom_bar(stat= "identity") + labs(title = "Mean SAT Score for NY Accredited Schools")
median_graph <- ggplot(mean_median_df, aes(x=Borough, y = Median)) + geom_bar(stat= "identity") + labs(title = "Median SAT Score for NY Accredited Schools")
Now that we have analyzed the total data, we will start to go into each borough and analyze the data per each borough.
As a reminder: if a marker is red, it is below the 1st Quartile Score of the data set. If a marker is orange, the SAT score is less than the national average. The marker is green if the SAT score is greater than 511.
This information was from : https://secure-media.collegeboard.org/digitalServices/pdf/sat/total-group-2015.pdf
In 2015, out of the 109 accredited high schools in Brooklyn, only 8 schools scored higher than the national average. Less than 10% (7.3%) of the schools have a SAT score higher than 511. From this map, you can the see of red and orange markers. There doesn’t seem to be too much of a pattern with schools who have a green marker, but you can tell that three is a large number of accredited high schools in the north eastern part of Brooklyn that have below average math SAT scores.
The schools that scored higher than the national average are as follows: [1] Brooklyn Latin School Fort Hamilton High School
[3] Midwood High School Millennium Brooklyn High School
[5] Brooklyn Technical High School John Dewey High School
[7] Medgar Evers College Preparatory School Leon M. Goldstein High School for the Sciences
The titles of some of these schools suggest that they are specialty schools or private schools. What this might suggest is that these schools often have access to different or more resources than average public schools.
The school with the lowest math SAT score in Brooklyn was the Multicultural High School. The school demographic break down is as follows: 99% Hispanic, 0.4% White.
The school with the highest SAT score in Brooklyn was the Brooklyn Technical High School. The school demographic breakdown is as follows: 60.5% Asian, 20.5% White, 7.7% Black and 7.6% Hispanic.
brooklyn_lowest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 319),])
brooklyn_lowest
## School.ID School.Name Borough Building.Code
## 290 19K583 Multicultural High School Brooklyn K420
## Street.Address City State Zip.Code Latitude Longitude
## 290 999 Jamaica Avenue Brooklyn NY 11208 40.69114 -73.86843
## Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 290 718-827-2796 8:15 AM 3:15 PM 229 0.4%
## Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 290 0.0% 99.6% 0.0% 319
## Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 290 323 284 28.6%
brooklyn_highest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 682 ),])
brooklyn_highest
## School.ID School.Name Borough Building.Code
## 327 13K430 Brooklyn Technical High School Brooklyn K430
## Street.Address City State Zip.Code Latitude Longitude
## 327 29 Ft Greene Place Brooklyn NY 11217 40.68811 -73.97675
## Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 327 718-804-6400 8:45 AM 3:15 PM 5447 20.5%
## Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 327 7.7% 7.6% 60.5% 682
## Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 327 608 606 95.5%
brooklyn_511 <- (newdata_brooklyn[which(newdata_brooklyn$Average.Score..SAT.Math. >= 511),]) #Schools with an SAT score higher than 511.
brooklyn_511_schoolnames <- brooklyn_511[,1] #the names of the schools with a score higher than 511
nrow(newdata_brooklyn[which(newdata_brooklyn$Average.Score..SAT.Math. >= 511),]) # 8 number of schools > 511
## [1] 8
nrow(newdata_brooklyn[which(newdata_brooklyn$Average.Score..SAT.Math. <= 511 ),]) # 101
## [1] 101
summary(newdata_brooklyn$Average.Score..SAT.Math.) #Identifying the Min, Median, Mean, Max of Math SAT Scores, 511 is the average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 319.0 379.0 395.0 416.4 441.0 682.0
#Min. 1st Qu. Median Mean 3rd Qu. Max.
# 319.0 379.0 395.0 416.4 441.0 682.0
getColor <- function(newdata_brooklyn) {
sapply(newdata_brooklyn$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
if(Average.Score..SAT.Math. <= 379) {
"red"
} else if(Average.Score..SAT.Math. <= 511 ) {
"orange"
} else { ## anything greater than 511
"green"
} })
}
icons <- awesomeIcons( #this is creating the icons.
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = getColor(newdata_brooklyn) #this refers back to the getColor function
)
leaflet(newdata_brooklyn) %>% addTiles() %>% #the map!
addAwesomeMarkers(~Longitude, ~Latitude, icon=icons, popup = ~as.character(newdata_brooklyn$Average.Score..SAT.Math.), label=~as.character(newdata_brooklyn$School.Name, newdata_brooklyn$Average.Score..SAT.Math.))
In 2015, out of the 69 accredited schools that reported the math SAT score, only 15 (21.7%) schools scored higher than the national average. In terms of a visual pattern, there seems to be a section of green markers in the center most part of Queens and more red and orange markers on the outskirts of the borough. The schools that scored higher than the national average are as follows:
[1] Aviation Career and Technical Education High School Bard High School Early College Queens
[3] Frank Sinatra School of the Arts High School Baccalaureate School for Global Education
[5] East-West School of International Studies Bayside High School
[7] Benjamin N. Cardozo High School Francis Lewis High School
[9] Queens School of Inquiry Townsend Harris High School
[11] Forest Hills High School Queens Gateway to Health Sciences Secondary School
[13] Thomas A. Edison Career and Technical Education High School Queens High School for the Sciences at York College
[15] Scholars’ Academy
Similar to Brooklyn, the school names that are listed suggest that they are either specialized or private. Because private or specialized schools often receive their funding from other sources than local/state governments (i.e., tuition, grants, etc), it is easier for specialized schools to have smaller teacher/student ratios or more resources for students (i.e., exam prep) available for their students.
Apart from Staten Island, Queens has the second highest average math SAT Score. The median score is 448 and the mean is 462.36. Although these mean and median scores are the second highest, the schools on average are ~140 points lower than the national average.
The Pan American International High School in Queens has the lowest math SAT score. Since this school is specifically for new immigrants from Latin America the school demographic breakdown is as follows: 99.7% Hispanic, and 0.3% Asian.
This compares to the school with the highest SAT math score, Queens High School for the Sciences at York College. The school demographic breakdown is 75.4% Asian, 10.6% Hispanic, 6.1% White, and 7.0% Black.
queens_lowest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 340),])
queens_lowest
## School.ID School.Name Borough Building.Code
## 390 24Q296 Pan American International High School Queens Q744
## Street.Address City State Zip.Code Latitude Longitude
## 390 45-10 94th Street Elmhurst NY 11373 40.7433 -73.87057
## Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 390 718-271-3602 8:30 AM 3:15 PM 378 0.0%
## Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 390 0.0% 99.7% 0.3% 340
## Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 390 320 318 31.9%
queens_highest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 701 ),])
queens_highest
## School.ID School.Name Borough
## 425 28Q687 Queens High School for the Sciences at York College Queens
## Building.Code Street.Address City State Zip.Code Latitude
## 425 Q774 94-50 159th Street Jamaica NY 11433 40.701
## Longitude Phone.Number Start.Time End.Time Student.Enrollment
## 425 -73.79815 718-657-3181 8:00 AM 3:18 PM 426
## Percent.White Percent.Black Percent.Hispanic Percent.Asian
## 425 6.1% 7.0% 10.6% 75.4%
## Average.Score..SAT.Math. Average.Score..SAT.Reading.
## 425 701 621
## Average.Score..SAT.Writing. Percent.Tested
## 425 625 97.9%
nrow(newdata_queens) # There are 69 rows in this data set
## [1] 69
queens_511 <- (newdata_queens[which(newdata_queens$Average.Score..SAT.Math. >= 511),]) #Schools that scored higher than 511
queens_511_schoolnames <- queens_511[,1] # The names of schools that scored highter than 511
nrow(newdata_queens[which(newdata_queens$Average.Score..SAT.Math. >= 511),]) # 15 high schools have a SAT Score higher than the national average 511
## [1] 15
nrow(newdata_queens[which(newdata_queens$Average.Score..SAT.Math. < 511 ),]) # 54 high schools in Queens have a math SAT Score lower than the national average.
## [1] 54
nrow(newdata_queens[which(newdata_queens$Average.Score..SAT.Math. < 448),]) # 34 high schools in Queens have a math SAT score lower than 448.
## [1] 34
summary(newdata_queens$Average.Score..SAT.Math.)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 340.0 415.0 448.0 462.4 490.0 701.0
#Min. 1st Qu. Median Mean 3rd Qu. Max.
#340.0 415.0 448.0 462.4 490.0 701.0
getColor2 <- function(newdata_queens) {
sapply(newdata_queens$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
if(Average.Score..SAT.Math. <= 415) {
"red"
} else if(Average.Score..SAT.Math. <= 511 ) {
"orange"
} else {
"green"
} })
}
icons2 <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = getColor2(newdata_queens)
)
leaflet(newdata_queens) %>% addTiles() %>%
addAwesomeMarkers(~Longitude, ~Latitude, icon=icons2, popup = ~as.character(newdata_queens$Average.Score..SAT.Math.), label=~as.character(newdata_queens$School.Name, newdata_queens$Average.Score..SAT.Math.))
In 2015, out of the 89 accredited high schools only 18 (20.2%) scored higher than the national average. Visually, it looks like majority of the green markers are found in lower Manhattan. There are clusters of orange markers in central and upper Manhattan. Below are the schools who scored higher than the national average:
[1] New Explorations into Science, Technology and Math High School
[2] High School for Dual Language and Asian Studies
[3] Bard High School Early College #cross listed in Queens/Manhattan
[4] Millennium High School
[5] Baruch College Campus High School
[6] School of the Future High School
[7] Manhattan Village Academy
[8] NYC Lab School for Collaborative Studies
[9] NYC Museum School
[10] NYC iSchool
[11] Eleanor Roosevelt High School
[12] Beacon High School
[13] Fiorello H. LaGuardia High School of Music and Art and Performing Arts [14] Manhattan / Hunter Science High School
[15] Columbia Secondary School
[16] Manhattan Center for Science and Mathematics
[17] High School for Mathematics, Science, and Engineering at City College [18] Stuyvesant High School
It looks like Bard High School Early College is cross-listed in the Manhattan and Queens neighborhood. This is important to note in-case high schools are double counted in this data set.
Similar to my analysis of schools in Brooklyn and Queens, the schools that scored higher than the national average are specialized or technical schools. This borough in particular accounts for several schools who are associated with colleges (Bard High School Early College).
After further research into Bard High School Early College (BEC), is stated that Bard relies on funding from public and private sources to ensure that these costs are not passed on to students and families. In comparison, public schools often have a limited funding source and cannot rely on private sources or donations. This may or may not be a reason for an higher average math ST score. However, I find it important to probe these questions as they give more weight to the context of the school and how that is associated with standardized test scores.
Looking more closely into the demographics of the highest and the lowest math SAT scores, I have found that the school with the lowest math SAT score, Coalition School for Social Change, has a demographic breakdown of 50.2% Hispanic, 41% Black, 4.2% Asian, and 3.5% White.
In comparison, the school that had the highest math SAT score, Stuyvesant High School, has a school demographic breakdown of 73.4% Asian, 20.4% White, 2.6% Hispanic, and 0.8% Black.
man_lowest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 344),])
man_lowest
## School.ID School.Name Borough
## 88 04M409 Coalition School for Social Change Manhattan
## 303 14K685 El Puente Academy for Peace and Justice Brooklyn
## 343 17K524 International High School at Prospect Heights Brooklyn
## Building.Code Street.Address City State Zip.Code Latitude
## 88 M045 2351 1st Avenue Manhattan NY 10035 40.79887
## 303 K778 250 Hooper Street Brooklyn NY 11211 40.70577
## 343 K440 883 Classon Avenue Brooklyn NY 11225 40.67030
## Longitude Phone.Number Start.Time End.Time Student.Enrollment
## 88 -73.93337 212-831-5153 8:45 AM 3:30 PM 283
## 303 -73.95573 718-387-1125 9:00 AM 3:30 PM 245
## 343 -73.96165 718-230-6333 8:30 AM 3:15 PM 414
## Percent.White Percent.Black Percent.Hispanic Percent.Asian
## 88 3.5% 41.0% 50.2% 4.2%
## 303 1.2% 11.8% 85.3% 0.0%
## 343 13.3% 29.0% 38.9% 18.4%
## Average.Score..SAT.Math. Average.Score..SAT.Reading.
## 88 344 368
## 303 344 380
## 343 344 302
## Average.Score..SAT.Writing. Percent.Tested
## 88 367 40.5%
## 303 379 62.5%
## 343 300 81.7%
man_highest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 754 ),])
man_highest
## School.ID School.Name Borough Building.Code
## 106 02M475 Stuyvesant High School Manhattan M477
## Street.Address City State Zip.Code Latitude Longitude
## 106 345 Chambers Street Manhattan NY 10282 40.71775 -74.01405
## Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 106 212-312-4800 8:00 AM 3:30 PM 3296 20.4%
## Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 106 0.8% 2.6% 73.4% 754
## Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 106 697 693 97.4%
Source https://bhsec.bard.edu/
nrow(newdata_manhattan) #89 There are 89 rows in this data set
## [1] 89
man_511 <- (newdata_manhattan[which(newdata_manhattan$Average.Score..SAT.Math. >= 511),])
man_511_schoolnames <- man_511[,1]
summary(newdata_manhattan$Average.Score..SAT.Math.) #Identifying the Min, Median, Mean, Max of Math SAT Scores, 511 is the average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 344.0 395.0 433.0 455.9 485.0 754.0
#Min. 1st Qu. Median Mean 3rd Qu. Max.
#344.0 395.0 433.0 455.9 485.0 754.0
nrow(newdata_manhattan[which(newdata_manhattan$Average.Score..SAT.Math. >= 511),]) # 18 high schools have a SAT Score higher than the national average 511
## [1] 18
nrow(newdata_manhattan[which(newdata_manhattan$Average.Score..SAT.Math. < 511 ),]) # 71 high schools have a math SAT Score lower than the national average.
## [1] 71
nrow(newdata_manhattan[which(newdata_manhattan$Average.Score..SAT.Math. < 448),]) # 56 high schools in have a math SAT score lower than 448.
## [1] 56
getColor_manhattan <- function(newdata_manhattan) {
sapply(newdata_manhattan$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
if( Average.Score..SAT.Math.<= 395) {
"red"
} else if(Average.Score..SAT.Math. <= 511) {
"orange"
} else {
"green"
} })
}
icons_manhattan <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = getColor_manhattan(newdata_manhattan)
)
leaflet(newdata_manhattan) %>% addTiles() %>%
addAwesomeMarkers(~Longitude, ~Latitude, icon=icons_manhattan, popup = ~as.character(newdata_manhattan$School.Name), label=~as.character(Average.Score..SAT.Math.))
In 2015, Staten Island has a total of 10 reported accredited high schools. Out of the 10 schools, only one school (10%) is above the national average. The majority of the high schools in Staten Island are below the national average. Given that there are so few accredited schools in Staten Island, there aren’t many visual patterns. However, the two red marker schools are in the north section of Staten Island.
The one school that scored above the national average was:
[1] Staten Island Technical High School
Given that there are a small number of schools in Staten Island compared to the other boroughs, it is difficult to compare the average scores of this borough to the other boroughs. However, similarly to the Manhattan, Brooklyn, and Queens the one school that did score above the national average was a technical school.
However, upon further research into this school I found that the Staten Island Technical High School is a top rated, public school. The school has around 1,300 students with a student-teacher ratio of 22 to 1. This research demonstrates that it is important to not make assumptions about the funding of a school and how it compares to private or specialized schools; however, since zip codes and property tax often determine how funded a school system is, I am curious to know the breakdown of the Staten Island Technical High School school demographic.
Upon further investigation, I found that the Staten Island Technical High School demographic is as follows: 52% White, 41% Asian, 5% Hispanic, and 1% Black. 99.7% of the students were tested. Although this is one school out of the 375 schools in this dataset, it is interesting to note the demographic reality of the schools that scored higher than the national average.
In comparison to Ralph R. McKee Career and Technical Education High School, a school who had the lowest SAT math score in Staten Island. The school’s demographic breakdown is as follows 42.5% Hispanic, 30.8% Black, 19.7% White, and 4.8% Asian.
This comparison is not complete, but it is interesting to note the stark demographic differences between the schools as well as the location differences.
staten_island_tech <- (clean_nycscores[which(clean_nycscores$School.Name == "Staten Island Technical High School"),])
staten_island_tech #school with highest SAT math score
## School.ID School.Name Borough
## 111 31R605 Staten Island Technical High School Staten Island
## Building.Code Street.Address City State Zip.Code Latitude
## 111 R440 485 Clawson Street Staten Island NY 10306 40.56791
## Longitude Phone.Number Start.Time End.Time Student.Enrollment
## 111 -74.11536 718-667-3222 7:45 AM 2:30 PM 1247
## Percent.White Percent.Black Percent.Hispanic Percent.Asian
## 111 52.2% 1.0% 5.2% 41.1%
## Average.Score..SAT.Math. Average.Score..SAT.Reading.
## 111 711 660
## Average.Score..SAT.Writing. Percent.Tested
## 111 670 99.7%
port_richmond_high <- (clean_nycscores[which(clean_nycscores$School.Name == "Ralph R. McKee Career and Technical Education High School"),]) # school with lowest SAT math score
Source: https://www.niche.com/k12/staten-island-technical-high-school-staten-island-ny/
nrow(newdata_StatenIsland) # 10 schools are recorded in Staten Island.
## [1] 10
stat_511 <- (newdata_StatenIsland[which(newdata_StatenIsland$Average.Score..SAT.Math. >= 511),])
stat_511_schoolnames <- stat_511[,1] #The names of the schools that scored higher than 511
summary(newdata_StatenIsland$Average.Score..SAT.Math.)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 420.0 444.8 465.5 486.2 491.2 711.0
#Min. 1st Qu. Median Mean 3rd Qu. Max.
# 420.0 444.8 465.5 486.2 491.2 711.0
nrow(newdata_StatenIsland[which(newdata_StatenIsland$Average.Score..SAT.Math. >= 511),]) # 1 school is above the national average.
## [1] 1
nrow(newdata_StatenIsland[which(newdata_StatenIsland$Average.Score..SAT.Math. < 511 ),]) # 6 high schools are below the national average
## [1] 9
nrow(newdata_StatenIsland[which(newdata_StatenIsland$Average.Score..SAT.Math. < 444),]) # 3 high schools are below the 1st Q
## [1] 3
getColor_SI <- function(newdata_StatenIsland) {
sapply(newdata_StatenIsland$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
if(Average.Score..SAT.Math. <= 444) {
"red"
} else if(Average.Score..SAT.Math. <= 511) {
"orange"
} else {
"green"
} })
}
icons_SI <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = getColor_SI(newdata_StatenIsland)
)
leaflet(newdata_StatenIsland) %>% addTiles() %>%
addAwesomeMarkers(~Longitude, ~Latitude, icon=icons_SI, popup = ~as.character(newdata_StatenIsland$School.Name), label=~as.character(Average.Score..SAT.Math.))
In 2015, out of the 98 accredited schools reported. Only 3 schools (3.08%) scored above the national average. This map shows a visual pattern of red and orange marked schools in the south Bronx.
The schools that score above average were: [1] Bronx Center for Science and Mathematics Bronx High School of Science
[3] High School of American Studies at Lehman College
Similar to the other boroughs, the schools that scored higher than 511 are either private or specialized. This specialization could suggest that schools focus on providing resources that allow students to succeed in mathematics or other technical fields.
However, compared to all the other boroughs, the Bronx had the least number of schools (3.08%) that scored higher than the national average. That is significantly less than the percentage of schools in Manhattan (20%), Queens (21%), Staten Island (10%) Brooklyn (7.3%), and the Bronx (3.08%).
Although I do not believe that SAT scores are a way to truly evaluate the success of a school, I do believe that it can be a way to analyze and compare boroughs on a observational level.
I am also curious about the school demographic with the highest and lowest SAT score. The school with the highest SAT score in the Bronx is the Bronx High School of Science. The school demographic is as follows: 62.8% Asian, 22.1% White, 5.5% Hispanic, and 2.6% Black. The school with the lowest SAT score is the Pan American International High School at Monroe. The school demographic is as follows: 100% Hispanic, 0% Asian, 0% Black, and 0% White. After further research into the Pan American International High School at Monroe is a school that offers new immigrants—including many unaccompanied minors—the chance to learn English and complete high school.
The Pan American International High School is a great example of why you cannot just use an SAT score to determine the success of a school. However, this marker may be a way to indicate that bilingual students who have just moved to the US have more difficulty taking the SAT than students who have grown up speaking English and other standardized tests.
Source: https://panamericanihs.org/
bronx_highest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. >= 714),]) # School with highest SAT math score
bronx_highest <- bronx_highest[2,] #name of school with highest SAT math score
bronx_highest
## School.ID School.Name Borough Building.Code
## 204 10X445 Bronx High School of Science Bronx X445
## Street.Address City State Zip.Code Latitude Longitude
## 204 75 West 205th Street Bronx NY 10468 40.87706 -73.88978
## Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 204 718-817-7700 8:00 AM 3:45 PM 3015 22.1%
## Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 204 2.6% 5.5% 62.8% 714
## Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 204 660 667 97.0%
bronx_lowest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 317),]) #Bronx school with lowest SAT math score
bronx_lowest
## School.ID School.Name Borough
## 218 12X388 Pan American International High School at Monroe Bronx
## Building.Code Street.Address City State Zip.Code Latitude
## 218 X420 1300 Boynton Avenue Bronx NY 10472 40.83137
## Longitude Phone.Number Start.Time End.Time Student.Enrollment
## 218 -73.87882 718-991-7238 8:30 AM 5:30 PM 428
## Percent.White Percent.Black Percent.Hispanic Percent.Asian
## 218 0.0% 0.0% 100.0% 0.0%
## Average.Score..SAT.Math. Average.Score..SAT.Reading.
## 218 317 315
## Average.Score..SAT.Writing. Percent.Tested
## 218 292 65.6%
`
nrow(newdata_bronx) #98 schools are in this dataset
## [1] 98
bronx_511 <- (newdata_bronx[which(newdata_bronx$Average.Score..SAT.Math. >= 511),]) #names of schools that scored higher than 511
bronx_511_schoolnames <- bronx_511[,1]
summary(newdata_bronx$Average.Score..SAT.Math.)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 317.0 378.2 395.5 404.4 418.0 714.0
#Min. 1st Qu. Median Mean 3rd Qu. Max.
# 317.0 378.2 395.5 404.4 418.0 714.0
nrow(newdata_bronx[which(newdata_bronx$Average.Score..SAT.Math. >= 511),]) # 3 school is above the national average.
## [1] 3
nrow(newdata_bronx[which(newdata_bronx$Average.Score..SAT.Math. < 511 ),]) # 95 high schools
## [1] 95
nrow(newdata_bronx[which(newdata_bronx$Average.Score..SAT.Math. < 444),]) # 88 high
## [1] 88
getColor_bronx <- function(newdata_bronx) {
sapply(newdata_bronx$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
if(Average.Score..SAT.Math. <= 378) {
"red"
} else if(Average.Score..SAT.Math. <= 511) {
"orange"
} else {
"green"
} })
}
icons_bronx <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = getColor_bronx(newdata_bronx)
)
leaflet(newdata_bronx) %>% addTiles() %>%
addAwesomeMarkers(~Longitude, ~Latitude, icon=icons_bronx,popup = ~as.character(newdata_bronx$School.Name), label=~as.character(Average.Score..SAT.Math.))
This project was a continuation of a project I did for Richard Ross’s class From Data to Knowledge. For that project, I analyzed the same data set, but focused on the demographic breakdown of schools and their corresponding SAT scores. My original question for that project was: Is there an association between the racial breakdown of a school and the average SAT Math Score in NYC in the 2014-2015 school year?
I used linear regression to look at the relationship between the various recorded races ( “Asian”, “Black”, “Hispanic”, and “White”,). This model was interesting because it could moderately predict the average Math SAT score based on the percentage of “Asian” and “White” Students in NYC schools. However, there were very many limitations of this project. For this final project, I wanted to build upon with what I learned how to do this year with various visualization packages and connect it to the same dataset.
Below I will detail my summary conclusions:
To begin, I want to reiterate that SAT scores are not a complete measure of a school’s success. The reason why I chose to focus on the math SAT score is because I wanted to focus on one variable. I believe that an SAT score is just a number that represents a small fraction of what a school can do; however, the unfortunate reality is that colleges and employees use SAT scores as a beginning marker for a student’s potential. When a school or community does not have the resources to support the preparation for these high-stakes standardized test, students and schools are the ones that suffer in the long run. It is these scores that help admit students into colleges and beyond. And it is for these reasons that I am compelled to analyze this data.
I used leaflet to help me graph the range of SAT scores. With this package, I was able to immediately see the schools that were above the national average (green markers) and the schools that were below (red and orange markers). Leaflet allowed me to zoom in and out of each borough and see the groups of schools that were a cluster of reds/oranges or green. Even though this was a good way to identify the schools’ scores, I noticed I was not able to clearly locate an area with “the best” SAT scores. I had to rely on doing analysis to determine the number and percentage of school’s with scores higher than the national average.
I also found that these schools with the top SAT math scores were either technical, specialized. college preparatory, or private schools. This specialization could suggest that schools focus on providing resources that allow students to succeed in mathematics or other technical fields.
From my analysis, I found that Staten Island had the highest median and mean math SAT score (465.5, 486.2). However, there were only 10 accredited high schools in this area. With 69 schools in the dataset, Queens was the second borough with the highest median/mean math SAT Score (448, 486). Bronx had the lowest mean and median math SAT score (396.5, 404.35).
In addition to learning about the SAT math scores in the specific boroughs, I was also curious about the demographic of the schools with the highest and lowest SAT score in each borough. I found this data by isolating the school with the highest SAT Score and the lowest SAT score and then detailing the percentages. Although this analysis was informational, it was not complete.
In my analysis, a trend did show that the schools with the highest SAT score (Bronx High School of Science, Staten Island Technical High School, Stuyvesant High School, and Brooklyn Technical High School) had a demographic that was majority Asian and/or White, I am still not certain that this data is completely accurate because it doesn’t account for the students that did not take the SAT.
Finally, in my attempt to learn about the schools with the lowest scores, I isolated the schools in their respective boroughs. I found that many of the schools were multilingual or were designed to host immigrants (Pan American International High School at Monroe, Ralph R. McKee Career and Technical Education High School, Coalition School for Social, Pan American International High School at Queens, and Multicultural High School Brooklyn). These schools often had a demographic of either majority Hispanic or Black students. However, knowing the the context of a school can help put into perspective the SAT scores.
In the future, I would like to continue to learn about SAT score trends and how data can be used to support the schools that need the most help. If we can use data to identify school districts that are falling behind on test scores or other resources, it is possible that local or state governments can provide funding or support.
Overall, this project only touched the surface of my inquires. Given the limited number of schools in each borough, my analysis was observational. However, I think the use of Leaflet and other summary analysis techniqued helped me to understand the geographic differences in SAT math test scores in New York City.