Background information

According to Kaggle.com, this dataset consists of the Average SAT Scores for NYC Public Schools. The Data are organized by 22 columns. The column names are as follows: department ID number, school name, borough, building code, street address, latitude/longitude coordinates, phone number, start and end times, student enrollment with race breakdown, and average scores on each SAT test section for the 2014-2015 school year. There are 435 rows and each row signifies every accredited high school in New York City. However, not all accredited schools have recorded SAT scores. In fact, there are 60 schools without reported data. For this project, I will remove the schools without SAT scores.

This dataset was not a sample. In 2015, the New York City Department of Education compiled and published the high school data with the assistance of the College Board SAT score averages and testing rates. Potential issues include a lack of file descriptions and column descriptions. Additionally, the update frequency of this data set is not specified.

Question

I am interested in exploring and visualizing the range of Math SAT scores in New York City. I am curious to know if the best SAT scores are concentrated in a specific borough or area? I am also curious to know how many schools are above the SAT math national average. Finally, I am also curious about the demographic of the schools with the highest and lowest SAT score in each borough.

Exploratory Data Analysis

Cleaning the Data

library(leaflet)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
getwd()
## [1] "/Users/blaisesevier/Desktop/DS 4001"
nycscores <- read.csv("scoresnyc.csv", sep= ",") #This is the original dataset

any(is.na(nycscores[])) #checking for N/As --> TRUE
## [1] TRUE
sum(is.na(nycscores[])) #number of N/As --> 187//60 rows
## [1] 187
colSums(is.na(nycscores[])) #determining which columns 
##                   School.ID                 School.Name 
##                           0                           0 
##                     Borough               Building.Code 
##                           0                           0 
##              Street.Address                        City 
##                           0                           0 
##                       State                    Zip.Code 
##                           0                           0 
##                    Latitude                   Longitude 
##                           0                           0 
##                Phone.Number                  Start.Time 
##                           0                           0 
##                    End.Time          Student.Enrollment 
##                           0                           7 
##               Percent.White               Percent.Black 
##                           0                           0 
##            Percent.Hispanic               Percent.Asian 
##                           0                           0 
##    Average.Score..SAT.Math. Average.Score..SAT.Reading. 
##                          60                          60 
## Average.Score..SAT.Writing.              Percent.Tested 
##                          60                           0
clean_nycscores <- na.omit(nycscores) #removing all na's from the data set. 

nrow(clean_nycscores) #There are 375 rows now in this data set. 
## [1] 375

Condensing the Data

condensed_nyc <- clean_nycscores[ , c(2,3,9,10, 19,20,21)] #Condensed rows to School name, Borough, Laditude, Longitude, Average SAT Scores (Math, Reading, Writing).

Dataset Explanation

For the purposes of this project, I am going to focus on one SAT subject area, math. Although it would be beneficial to study all three of the subject areas, I feel that it is important to focus on one subject area to ensure a thorough analysis.

In 2015, a record 1.70 million students from the class of 2015 took the SAT. According to collegeboard.org, the math SAT national average was 511.

Below, you will see a variety of data exploration models. For each borough you will see a summary model that explains the condensed_nyc’s Average Score SAT Math minimum, 1st Quartile, Median, Mean, 3rd Quartile, and Max.

Before every borough analysis, I will conduct this summary model to give me an idea of the spread of the data. Then, you will see a map that will visualize each school’s name, location and math SAT score.The SAT score will help me determine the various color icons.

An icon will be determined to be red if the school’s math SAT score is below the populations 1st Quartile of the dataset. An icon will be determined orange if the school’s score is lower than the national average (511). An icon will be determined green if the school’s SAT score is equal to or higher than the national average score.

Before looking into each borough, I will start by looking at the entire dataset to get an idea of the spread of the math SAT scores, and the general marker patterns.

Disclaimer, I do not believe that test scores are a complete representation of a school’s success. Judging the success of a school solely on an SAT eliminates so many other markers. Since this is just a surface level analysis, I believe that looking at a school’s average math SAT is a way to break into learning more about a school.

summary(condensed_nyc$Average.Score..SAT.Math.) #Total Average 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   317.0   386.0   415.0   432.9   458.5   754.0
  #Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  #317.0   386.0   415.0   432.9   458.5   754.0 


getColor_set <- function(condensed_nyc) {
  sapply(condensed_nyc$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
  if(Average.Score..SAT.Math. <= 386 ) {
    "red"
  } else if( Average.Score..SAT.Math.<= 511) {
    "orange"
  } else {
    "green"
  } })
}

icons_set <- awesomeIcons(
  icon = 'ios-close',
  iconColor = 'black',
  library = 'ion',
  markerColor = getColor_set(condensed_nyc)
)

leaflet(condensed_nyc) %>% addTiles() %>%
  addAwesomeMarkers(~Longitude, ~Latitude, icon=icons_set, popup = ~as.character(condensed_nyc$School.Name), label=~as.character(Average.Score..SAT.Math.))

Entire Set Breakdown

The unfortunate reality of this map is the enormous amount of red and yellow markers. Out of the 375 accredited high schools, only 45 or 12% were above the national average. This means that 88% of NYC accredited high schools were below the national average for the 2015 math SAT.

If you zoom in on the map of NYC, you are able to see some clusters of green, yellow and red markers.

nrow(condensed_nyc[which(condensed_nyc$Average.Score..SAT.Math. >= 511),]) # 45 schools are above the National Average
## [1] 45

Separating the Data by Borough

This data chunk, is the first step to understanding and comparing the various SAT scores between each borough. However, it is important to note the change in reported schools after I removed the various N/As in the dataset.

The breakdown is as follows: 20 (17%) schools were removed from the Bronx borough. 12 (9.9%) schools were removed from the Brooklyn borough. 17 (16%) schools were removed from the Manhattan borough. 11 (13%) schools were removed from the Queens neighborhood. There were no schools removed from the Staten Island borough.

Given that there were varying amounts of schools removed per neighborhood, it is important to factor this missing information into the impending analysis. From this lack of data, one can ask why 60 accredited schools in NYC did not report their data?

summary(nycscores$Borough) #BEFORE THE CLEANED DATA
##         Bronx      Brooklyn     Manhattan        Queens Staten Island 
##           118           121           106            80            10
#Bronx      Brooklyn     Manhattan        Queens Staten Island 
#118           121           106            80            10 
summary(condensed_nyc$Borough) #AFTER THE CLEANED DATA
##         Bronx      Brooklyn     Manhattan        Queens Staten Island 
##            98           109            89            69            10
 #Bronx      Brooklyn     Manhattan        Queens Staten Island 
    #98           109            89            69            10 


newdata_brooklyn <- condensed_nyc[ which(condensed_nyc$Borough =='Brooklyn'),] #separation for future visuals

newdata_manhattan <-condensed_nyc[which(condensed_nyc$Borough == "Manhattan"),]

newdata_bronx <-condensed_nyc[which(condensed_nyc$Borough == "Bronx"),]

newdata_StatenIsland <-condensed_nyc[which(condensed_nyc$Borough == "Staten Island"),]

newdata_queens <- condensed_nyc[which(condensed_nyc$Borough == "Queens"),] 

library(ggplot2)

2015 Math SAT Information

The National Average for the 2015 SAT Score for Math, Reading and Writing was 511, 495, and 484 respectively. Below you will find maps with markers for each accredited high school in Brooklyn, Queens, Manhattan, Staten Island, and Bronx. Before the visual, I have provided a summary of the minimum, median, mean, and max scores.

Additionally, below is a boxplot graph that shows the summary data for the NYC SAT Math score divided by borough. As you can see, Staten Island has the highest median math SAT score (465.5) and the Bronx has the lowest SAT score (395.5). However the number of schools recorded in each school is borough is vastly different. This makes it very difficult to compare the boroughs.

bp <- ggplot(clean_nycscores, aes(x=Borough, y =Average.Score..SAT.Math., group = Borough)) + geom_boxplot(aes(fill = Borough)) + labs(title = "Average SAT Math Scores", x = "Borough", y = "MATH SAT SCORE")
  
bp #This is a boxplot that graphs the Average SAT Math Scores and compares the median by Borough

med_bronx <- median(newdata_bronx$Average.Score..SAT.Math.) #set up for median and mean comparison tables
mean_bronx <-mean(newdata_bronx$Average.Score..SAT.Math.)

med_SI <- median(newdata_StatenIsland$Average.Score..SAT.Math.)
mean_SI <-mean(newdata_StatenIsland$Average.Score..SAT.Math.)

med_queens <-median(newdata_queens$Average.Score..SAT.Math.)
mean_queens<-mean(newdata_queens$Average.Score..SAT.Math.)

med_brook <- median(newdata_brooklyn$Average.Score..SAT.Math.)
mean_brook <- mean(newdata_brooklyn$Average.Score..SAT.Math.)

med_man <- median(newdata_manhattan$Average.Score..SAT.Math.)
mean_man <- mean(newdata_manhattan$Average.Score..SAT.Math.)

cbind("Bronx", "Staten Island", "Queens", "Brooklyn", "Manhattan")
##      [,1]    [,2]            [,3]     [,4]       [,5]       
## [1,] "Bronx" "Staten Island" "Queens" "Brooklyn" "Manhattan"
mean_median_df <- data.frame("Borough" = rbind("Bronx", "Staten Island", "Queens", "Brooklyn", "Manhattan"), "Median" = rbind(med_bronx, med_SI, med_queens, med_brook, med_man), "Mean" = rbind(mean_bronx, mean_SI, mean_queens, mean_brook, mean_man))

mean_median_df#table that shows the Median/Mean of all the boroughs
##                  Borough Median     Mean
## med_bronx          Bronx  395.5 404.3571
## med_SI     Staten Island  465.5 486.2000
## med_queens        Queens  448.0 462.3623
## med_brook       Brooklyn  395.0 416.4037
## med_man        Manhattan  433.0 455.8876
mean_graph <- ggplot(mean_median_df, aes(x=Borough, y = Mean)) + geom_bar(stat= "identity") + labs(title = "Mean SAT Score for NY Accredited Schools")

median_graph <- ggplot(mean_median_df, aes(x=Borough, y = Median)) + geom_bar(stat= "identity") + labs(title = "Median SAT Score for NY Accredited Schools")

Now that we have analyzed the total data, we will start to go into each borough and analyze the data per each borough.

As a reminder: if a marker is red, it is below the 1st Quartile Score of the data set. If a marker is orange, the SAT score is less than the national average. The marker is green if the SAT score is greater than 511.

This information was from : https://secure-media.collegeboard.org/digitalServices/pdf/sat/total-group-2015.pdf

Brooklyn

In 2015, out of the 109 accredited high schools in Brooklyn, only 8 schools scored higher than the national average. Less than 10% (7.3%) of the schools have a SAT score higher than 511. From this map, you can the see of red and orange markers. There doesn’t seem to be too much of a pattern with schools who have a green marker, but you can tell that three is a large number of accredited high schools in the north eastern part of Brooklyn that have below average math SAT scores.

The schools that scored higher than the national average are as follows: [1] Brooklyn Latin School Fort Hamilton High School
[3] Midwood High School Millennium Brooklyn High School
[5] Brooklyn Technical High School John Dewey High School
[7] Medgar Evers College Preparatory School Leon M. Goldstein High School for the Sciences

The titles of some of these schools suggest that they are specialty schools or private schools. What this might suggest is that these schools often have access to different or more resources than average public schools.

Demographic Breakdown

The school with the lowest math SAT score in Brooklyn was the Multicultural High School. The school demographic break down is as follows: 99% Hispanic, 0.4% White.

The school with the highest SAT score in Brooklyn was the Brooklyn Technical High School. The school demographic breakdown is as follows: 60.5% Asian, 20.5% White, 7.7% Black and 7.6% Hispanic.

brooklyn_lowest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 319),])
brooklyn_lowest
##     School.ID               School.Name  Borough Building.Code
## 290    19K583 Multicultural High School Brooklyn          K420
##         Street.Address     City State Zip.Code Latitude Longitude
## 290 999 Jamaica Avenue Brooklyn    NY    11208 40.69114 -73.86843
##     Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 290 718-827-2796    8:15 AM  3:15 PM                229          0.4%
##     Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 290          0.0%            99.6%          0.0%                      319
##     Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 290                         323                         284          28.6%
brooklyn_highest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 682 ),])
brooklyn_highest
##     School.ID                    School.Name  Borough Building.Code
## 327    13K430 Brooklyn Technical High School Brooklyn          K430
##         Street.Address     City State Zip.Code Latitude Longitude
## 327 29 Ft Greene Place Brooklyn    NY    11217 40.68811 -73.97675
##     Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 327 718-804-6400    8:45 AM  3:15 PM               5447         20.5%
##     Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 327          7.7%             7.6%         60.5%                      682
##     Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 327                         608                         606          95.5%
brooklyn_511 <- (newdata_brooklyn[which(newdata_brooklyn$Average.Score..SAT.Math. >= 511),]) #Schools with an SAT score higher than 511. 

brooklyn_511_schoolnames <- brooklyn_511[,1] #the names of the schools with a score higher than 511

nrow(newdata_brooklyn[which(newdata_brooklyn$Average.Score..SAT.Math. >= 511),]) # 8 number of schools > 511 
## [1] 8
nrow(newdata_brooklyn[which(newdata_brooklyn$Average.Score..SAT.Math. <= 511 ),]) # 101
## [1] 101
summary(newdata_brooklyn$Average.Score..SAT.Math.) #Identifying the Min, Median, Mean, Max of Math SAT Scores, 511 is the average
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   319.0   379.0   395.0   416.4   441.0   682.0
  #Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 319.0   379.0   395.0   416.4   441.0   682.0 

getColor <- function(newdata_brooklyn) {
  sapply(newdata_brooklyn$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
  if(Average.Score..SAT.Math. <= 379) {
    "red"
  } else if(Average.Score..SAT.Math. <= 511 ) {
    "orange"
  } else { ## anything greater than 511
    "green"
  } })
}

icons <- awesomeIcons( #this is creating the icons.
  icon = 'ios-close',
  iconColor = 'black',
  library = 'ion',
  markerColor = getColor(newdata_brooklyn) #this refers back to the getColor function
)

leaflet(newdata_brooklyn) %>% addTiles() %>% #the map!
  addAwesomeMarkers(~Longitude, ~Latitude, icon=icons, popup = ~as.character(newdata_brooklyn$Average.Score..SAT.Math.),  label=~as.character(newdata_brooklyn$School.Name, newdata_brooklyn$Average.Score..SAT.Math.))

Queens

In 2015, out of the 69 accredited schools that reported the math SAT score, only 15 (21.7%) schools scored higher than the national average. In terms of a visual pattern, there seems to be a section of green markers in the center most part of Queens and more red and orange markers on the outskirts of the borough. The schools that scored higher than the national average are as follows:

[1] Aviation Career and Technical Education High School Bard High School Early College Queens
[3] Frank Sinatra School of the Arts High School Baccalaureate School for Global Education
[5] East-West School of International Studies Bayside High School
[7] Benjamin N. Cardozo High School Francis Lewis High School
[9] Queens School of Inquiry Townsend Harris High School
[11] Forest Hills High School Queens Gateway to Health Sciences Secondary School
[13] Thomas A. Edison Career and Technical Education High School Queens High School for the Sciences at York College
[15] Scholars’ Academy

Similar to Brooklyn, the school names that are listed suggest that they are either specialized or private. Because private or specialized schools often receive their funding from other sources than local/state governments (i.e., tuition, grants, etc), it is easier for specialized schools to have smaller teacher/student ratios or more resources for students (i.e., exam prep) available for their students.

Apart from Staten Island, Queens has the second highest average math SAT Score. The median score is 448 and the mean is 462.36. Although these mean and median scores are the second highest, the schools on average are ~140 points lower than the national average.

Demographic Analysis

The Pan American International High School in Queens has the lowest math SAT score. Since this school is specifically for new immigrants from Latin America the school demographic breakdown is as follows: 99.7% Hispanic, and 0.3% Asian.

This compares to the school with the highest SAT math score, Queens High School for the Sciences at York College. The school demographic breakdown is 75.4% Asian, 10.6% Hispanic, 6.1% White, and 7.0% Black.

queens_lowest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 340),])
queens_lowest
##     School.ID                            School.Name Borough Building.Code
## 390    24Q296 Pan American International High School  Queens          Q744
##        Street.Address     City State Zip.Code Latitude Longitude
## 390 45-10 94th Street Elmhurst    NY    11373  40.7433 -73.87057
##     Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 390 718-271-3602    8:30 AM  3:15 PM                378          0.0%
##     Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 390          0.0%            99.7%          0.3%                      340
##     Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 390                         320                         318          31.9%
queens_highest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 701 ),])
queens_highest
##     School.ID                                         School.Name Borough
## 425    28Q687 Queens High School for the Sciences at York College  Queens
##     Building.Code     Street.Address    City State Zip.Code Latitude
## 425          Q774 94-50 159th Street Jamaica    NY    11433   40.701
##     Longitude Phone.Number Start.Time End.Time Student.Enrollment
## 425 -73.79815 718-657-3181    8:00 AM  3:18 PM                426
##     Percent.White Percent.Black Percent.Hispanic Percent.Asian
## 425          6.1%          7.0%            10.6%         75.4%
##     Average.Score..SAT.Math. Average.Score..SAT.Reading.
## 425                      701                         621
##     Average.Score..SAT.Writing. Percent.Tested
## 425                         625          97.9%
nrow(newdata_queens) # There are 69 rows in this data set
## [1] 69
queens_511 <- (newdata_queens[which(newdata_queens$Average.Score..SAT.Math. >= 511),]) #Schools that scored higher than 511
queens_511_schoolnames <- queens_511[,1] # The names of schools that scored highter than 511

nrow(newdata_queens[which(newdata_queens$Average.Score..SAT.Math. >= 511),]) # 15 high schools have a SAT Score higher than  the national average 511
## [1] 15
nrow(newdata_queens[which(newdata_queens$Average.Score..SAT.Math. < 511 ),]) # 54 high schools in Queens have a math SAT Score lower than the national average. 
## [1] 54
nrow(newdata_queens[which(newdata_queens$Average.Score..SAT.Math. < 448),]) # 34 high schools in Queens have a math SAT score lower than 448. 
## [1] 34
summary(newdata_queens$Average.Score..SAT.Math.) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   340.0   415.0   448.0   462.4   490.0   701.0
#Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  #340.0   415.0   448.0   462.4   490.0   701.0 

getColor2 <- function(newdata_queens) {
  sapply(newdata_queens$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
  if(Average.Score..SAT.Math. <= 415) {
    "red"
  } else if(Average.Score..SAT.Math. <= 511 ) {
    "orange"
  } else { 
    "green"
  } })
}

icons2 <- awesomeIcons(
  icon = 'ios-close',
  iconColor = 'black',
  library = 'ion',
  markerColor = getColor2(newdata_queens)
)

leaflet(newdata_queens) %>% addTiles() %>%
  addAwesomeMarkers(~Longitude, ~Latitude, icon=icons2, popup = ~as.character(newdata_queens$Average.Score..SAT.Math.),  label=~as.character(newdata_queens$School.Name, newdata_queens$Average.Score..SAT.Math.))

Manhattan

In 2015, out of the 89 accredited high schools only 18 (20.2%) scored higher than the national average. Visually, it looks like majority of the green markers are found in lower Manhattan. There are clusters of orange markers in central and upper Manhattan. Below are the schools who scored higher than the national average:

[1] New Explorations into Science, Technology and Math High School
[2] High School for Dual Language and Asian Studies
[3] Bard High School Early College #cross listed in Queens/Manhattan
[4] Millennium High School
[5] Baruch College Campus High School
[6] School of the Future High School
[7] Manhattan Village Academy
[8] NYC Lab School for Collaborative Studies
[9] NYC Museum School
[10] NYC iSchool
[11] Eleanor Roosevelt High School
[12] Beacon High School
[13] Fiorello H. LaGuardia High School of Music and Art and Performing Arts [14] Manhattan / Hunter Science High School
[15] Columbia Secondary School
[16] Manhattan Center for Science and Mathematics
[17] High School for Mathematics, Science, and Engineering at City College [18] Stuyvesant High School

It looks like Bard High School Early College is cross-listed in the Manhattan and Queens neighborhood. This is important to note in-case high schools are double counted in this data set.

Analysis

Similar to my analysis of schools in Brooklyn and Queens, the schools that scored higher than the national average are specialized or technical schools. This borough in particular accounts for several schools who are associated with colleges (Bard High School Early College).

After further research into Bard High School Early College (BEC), is stated that Bard relies on funding from public and private sources to ensure that these costs are not passed on to students and families. In comparison, public schools often have a limited funding source and cannot rely on private sources or donations. This may or may not be a reason for an higher average math ST score. However, I find it important to probe these questions as they give more weight to the context of the school and how that is associated with standardized test scores.

Demographic Analysis

Looking more closely into the demographics of the highest and the lowest math SAT scores, I have found that the school with the lowest math SAT score, Coalition School for Social Change, has a demographic breakdown of 50.2% Hispanic, 41% Black, 4.2% Asian, and 3.5% White.

In comparison, the school that had the highest math SAT score, Stuyvesant High School, has a school demographic breakdown of 73.4% Asian, 20.4% White, 2.6% Hispanic, and 0.8% Black.

man_lowest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 344),])
man_lowest
##     School.ID                                   School.Name   Borough
## 88     04M409            Coalition School for Social Change Manhattan
## 303    14K685       El Puente Academy for Peace and Justice  Brooklyn
## 343    17K524 International High School at Prospect Heights  Brooklyn
##     Building.Code     Street.Address      City State Zip.Code Latitude
## 88           M045    2351 1st Avenue Manhattan    NY    10035 40.79887
## 303          K778  250 Hooper Street  Brooklyn    NY    11211 40.70577
## 343          K440 883 Classon Avenue  Brooklyn    NY    11225 40.67030
##     Longitude Phone.Number Start.Time End.Time Student.Enrollment
## 88  -73.93337 212-831-5153    8:45 AM  3:30 PM                283
## 303 -73.95573 718-387-1125    9:00 AM  3:30 PM                245
## 343 -73.96165 718-230-6333    8:30 AM  3:15 PM                414
##     Percent.White Percent.Black Percent.Hispanic Percent.Asian
## 88           3.5%         41.0%            50.2%          4.2%
## 303          1.2%         11.8%            85.3%          0.0%
## 343         13.3%         29.0%            38.9%         18.4%
##     Average.Score..SAT.Math. Average.Score..SAT.Reading.
## 88                       344                         368
## 303                      344                         380
## 343                      344                         302
##     Average.Score..SAT.Writing. Percent.Tested
## 88                          367          40.5%
## 303                         379          62.5%
## 343                         300          81.7%
man_highest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 754 ),])
man_highest
##     School.ID            School.Name   Borough Building.Code
## 106    02M475 Stuyvesant High School Manhattan          M477
##          Street.Address      City State Zip.Code Latitude Longitude
## 106 345 Chambers Street Manhattan    NY    10282 40.71775 -74.01405
##     Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 106 212-312-4800    8:00 AM  3:30 PM               3296         20.4%
##     Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 106          0.8%             2.6%         73.4%                      754
##     Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 106                         697                         693          97.4%

Source https://bhsec.bard.edu/

nrow(newdata_manhattan) #89 There are 89 rows in this data set
## [1] 89
man_511 <- (newdata_manhattan[which(newdata_manhattan$Average.Score..SAT.Math. >= 511),])
man_511_schoolnames <- man_511[,1]

summary(newdata_manhattan$Average.Score..SAT.Math.) #Identifying the Min, Median, Mean, Max of Math SAT Scores, 511 is the average
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   344.0   395.0   433.0   455.9   485.0   754.0
  #Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  #344.0   395.0   433.0   455.9   485.0   754.0

nrow(newdata_manhattan[which(newdata_manhattan$Average.Score..SAT.Math. >= 511),]) # 18 high schools have a SAT Score higher than  the national average 511
## [1] 18
nrow(newdata_manhattan[which(newdata_manhattan$Average.Score..SAT.Math. < 511 ),]) # 71 high schools have a math SAT Score lower than the national average. 
## [1] 71
nrow(newdata_manhattan[which(newdata_manhattan$Average.Score..SAT.Math. < 448),]) # 56 high schools in have a math SAT score lower than 448. 
## [1] 56
getColor_manhattan <- function(newdata_manhattan) {
  sapply(newdata_manhattan$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
  if( Average.Score..SAT.Math.<= 395) {
    "red"
  } else if(Average.Score..SAT.Math. <= 511) {
    "orange"
  } else {
    "green"
  } })
}

icons_manhattan <- awesomeIcons(
  icon = 'ios-close',
  iconColor = 'black',
  library = 'ion',
  markerColor = getColor_manhattan(newdata_manhattan)
)

leaflet(newdata_manhattan) %>% addTiles() %>%
  addAwesomeMarkers(~Longitude, ~Latitude, icon=icons_manhattan, popup = ~as.character(newdata_manhattan$School.Name), label=~as.character(Average.Score..SAT.Math.))

Staten Island

In 2015, Staten Island has a total of 10 reported accredited high schools. Out of the 10 schools, only one school (10%) is above the national average. The majority of the high schools in Staten Island are below the national average. Given that there are so few accredited schools in Staten Island, there aren’t many visual patterns. However, the two red marker schools are in the north section of Staten Island.

The one school that scored above the national average was:

[1] Staten Island Technical High School

Analysis

Given that there are a small number of schools in Staten Island compared to the other boroughs, it is difficult to compare the average scores of this borough to the other boroughs. However, similarly to the Manhattan, Brooklyn, and Queens the one school that did score above the national average was a technical school.

However, upon further research into this school I found that the Staten Island Technical High School is a top rated, public school. The school has around 1,300 students with a student-teacher ratio of 22 to 1. This research demonstrates that it is important to not make assumptions about the funding of a school and how it compares to private or specialized schools; however, since zip codes and property tax often determine how funded a school system is, I am curious to know the breakdown of the Staten Island Technical High School school demographic.

Upon further investigation, I found that the Staten Island Technical High School demographic is as follows: 52% White, 41% Asian, 5% Hispanic, and 1% Black. 99.7% of the students were tested. Although this is one school out of the 375 schools in this dataset, it is interesting to note the demographic reality of the schools that scored higher than the national average.

In comparison to Ralph R. McKee Career and Technical Education High School, a school who had the lowest SAT math score in Staten Island. The school’s demographic breakdown is as follows 42.5% Hispanic, 30.8% Black, 19.7% White, and 4.8% Asian.

This comparison is not complete, but it is interesting to note the stark demographic differences between the schools as well as the location differences.

staten_island_tech <- (clean_nycscores[which(clean_nycscores$School.Name == "Staten Island Technical High School"),])

staten_island_tech #school with highest SAT math score
##     School.ID                         School.Name       Borough
## 111    31R605 Staten Island Technical High School Staten Island
##     Building.Code     Street.Address          City State Zip.Code Latitude
## 111          R440 485 Clawson Street Staten Island    NY    10306 40.56791
##     Longitude Phone.Number Start.Time End.Time Student.Enrollment
## 111 -74.11536 718-667-3222    7:45 AM  2:30 PM               1247
##     Percent.White Percent.Black Percent.Hispanic Percent.Asian
## 111         52.2%          1.0%             5.2%         41.1%
##     Average.Score..SAT.Math. Average.Score..SAT.Reading.
## 111                      711                         660
##     Average.Score..SAT.Writing. Percent.Tested
## 111                         670          99.7%
port_richmond_high <- (clean_nycscores[which(clean_nycscores$School.Name == "Ralph R. McKee Career and Technical Education High School"),]) # school with lowest SAT math score 

Source: https://www.niche.com/k12/staten-island-technical-high-school-staten-island-ny/

nrow(newdata_StatenIsland) # 10 schools are recorded in Staten Island.
## [1] 10
stat_511 <- (newdata_StatenIsland[which(newdata_StatenIsland$Average.Score..SAT.Math. >= 511),])
stat_511_schoolnames <- stat_511[,1] #The names of the schools that scored higher than 511

summary(newdata_StatenIsland$Average.Score..SAT.Math.)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   420.0   444.8   465.5   486.2   491.2   711.0
   #Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 420.0   444.8   465.5   486.2   491.2   711.0 


nrow(newdata_StatenIsland[which(newdata_StatenIsland$Average.Score..SAT.Math. >= 511),]) # 1 school is above the national average. 
## [1] 1
nrow(newdata_StatenIsland[which(newdata_StatenIsland$Average.Score..SAT.Math. < 511 ),]) # 6 high schools are below the national average
## [1] 9
nrow(newdata_StatenIsland[which(newdata_StatenIsland$Average.Score..SAT.Math. < 444),]) # 3 high schools are below the 1st Q
## [1] 3
getColor_SI <- function(newdata_StatenIsland) {
  sapply(newdata_StatenIsland$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
  if(Average.Score..SAT.Math. <= 444) {
    "red"
  } else if(Average.Score..SAT.Math. <= 511) {
    "orange"
  } else {
    "green"
  } })
}

icons_SI <- awesomeIcons(
  icon = 'ios-close',
  iconColor = 'black',
  library = 'ion',
  markerColor = getColor_SI(newdata_StatenIsland)
)

leaflet(newdata_StatenIsland) %>% addTiles() %>%
  addAwesomeMarkers(~Longitude, ~Latitude, icon=icons_SI, popup = ~as.character(newdata_StatenIsland$School.Name), label=~as.character(Average.Score..SAT.Math.))

Bronx

In 2015, out of the 98 accredited schools reported. Only 3 schools (3.08%) scored above the national average. This map shows a visual pattern of red and orange marked schools in the south Bronx.

The schools that score above average were: [1] Bronx Center for Science and Mathematics Bronx High School of Science
[3] High School of American Studies at Lehman College

Analysis

Similar to the other boroughs, the schools that scored higher than 511 are either private or specialized. This specialization could suggest that schools focus on providing resources that allow students to succeed in mathematics or other technical fields.

However, compared to all the other boroughs, the Bronx had the least number of schools (3.08%) that scored higher than the national average. That is significantly less than the percentage of schools in Manhattan (20%), Queens (21%), Staten Island (10%) Brooklyn (7.3%), and the Bronx (3.08%).

Although I do not believe that SAT scores are a way to truly evaluate the success of a school, I do believe that it can be a way to analyze and compare boroughs on a observational level.

I am also curious about the school demographic with the highest and lowest SAT score. The school with the highest SAT score in the Bronx is the Bronx High School of Science. The school demographic is as follows: 62.8% Asian, 22.1% White, 5.5% Hispanic, and 2.6% Black. The school with the lowest SAT score is the Pan American International High School at Monroe. The school demographic is as follows: 100% Hispanic, 0% Asian, 0% Black, and 0% White. After further research into the Pan American International High School at Monroe is a school that offers new immigrants—including many unaccompanied minors—the chance to learn English and complete high school.

The Pan American International High School is a great example of why you cannot just use an SAT score to determine the success of a school. However, this marker may be a way to indicate that bilingual students who have just moved to the US have more difficulty taking the SAT than students who have grown up speaking English and other standardized tests.

Source: https://panamericanihs.org/

bronx_highest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. >= 714),]) # School with highest SAT math score
bronx_highest <- bronx_highest[2,] #name of school with highest SAT math score
bronx_highest
##     School.ID                  School.Name Borough Building.Code
## 204    10X445 Bronx High School of Science   Bronx          X445
##           Street.Address  City State Zip.Code Latitude Longitude
## 204 75 West 205th Street Bronx    NY    10468 40.87706 -73.88978
##     Phone.Number Start.Time End.Time Student.Enrollment Percent.White
## 204 718-817-7700    8:00 AM  3:45 PM               3015         22.1%
##     Percent.Black Percent.Hispanic Percent.Asian Average.Score..SAT.Math.
## 204          2.6%             5.5%         62.8%                      714
##     Average.Score..SAT.Reading. Average.Score..SAT.Writing. Percent.Tested
## 204                         660                         667          97.0%
bronx_lowest <- (clean_nycscores[which(clean_nycscores$Average.Score..SAT.Math. == 317),]) #Bronx school with lowest SAT math score
bronx_lowest
##     School.ID                                      School.Name Borough
## 218    12X388 Pan American International High School at Monroe   Bronx
##     Building.Code      Street.Address  City State Zip.Code Latitude
## 218          X420 1300 Boynton Avenue Bronx    NY    10472 40.83137
##     Longitude Phone.Number Start.Time End.Time Student.Enrollment
## 218 -73.87882 718-991-7238    8:30 AM  5:30 PM                428
##     Percent.White Percent.Black Percent.Hispanic Percent.Asian
## 218          0.0%          0.0%           100.0%          0.0%
##     Average.Score..SAT.Math. Average.Score..SAT.Reading.
## 218                      317                         315
##     Average.Score..SAT.Writing. Percent.Tested
## 218                         292          65.6%

`

nrow(newdata_bronx) #98 schools are in this dataset
## [1] 98
bronx_511 <- (newdata_bronx[which(newdata_bronx$Average.Score..SAT.Math. >= 511),]) #names of schools that scored higher than 511
bronx_511_schoolnames <- bronx_511[,1]

summary(newdata_bronx$Average.Score..SAT.Math.)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   317.0   378.2   395.5   404.4   418.0   714.0
  #Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 317.0   378.2   395.5   404.4   418.0   714.0

nrow(newdata_bronx[which(newdata_bronx$Average.Score..SAT.Math. >= 511),]) # 3 school is above the national average. 
## [1] 3
nrow(newdata_bronx[which(newdata_bronx$Average.Score..SAT.Math. < 511 ),]) # 95 high schools 
## [1] 95
nrow(newdata_bronx[which(newdata_bronx$Average.Score..SAT.Math.  < 444),]) # 88 high 
## [1] 88
getColor_bronx <- function(newdata_bronx) {
  sapply(newdata_bronx$Average.Score..SAT.Math., function(Average.Score..SAT.Math.) {
  if(Average.Score..SAT.Math. <= 378) {
    "red"
  } else if(Average.Score..SAT.Math.  <= 511) {
    "orange"
  } else {
    "green"
  } })
}

icons_bronx <- awesomeIcons(
  icon = 'ios-close',
  iconColor = 'black',
  library = 'ion',
  markerColor = getColor_bronx(newdata_bronx)
)

leaflet(newdata_bronx) %>% addTiles() %>%
  addAwesomeMarkers(~Longitude, ~Latitude, icon=icons_bronx,popup = ~as.character(newdata_bronx$School.Name),  label=~as.character(Average.Score..SAT.Math.))

Conclusion

This project was a continuation of a project I did for Richard Ross’s class From Data to Knowledge. For that project, I analyzed the same data set, but focused on the demographic breakdown of schools and their corresponding SAT scores. My original question for that project was: Is there an association between the racial breakdown of a school and the average SAT Math Score in NYC in the 2014-2015 school year?

I used linear regression to look at the relationship between the various recorded races ( “Asian”, “Black”, “Hispanic”, and “White”,). This model was interesting because it could moderately predict the average Math SAT score based on the percentage of “Asian” and “White” Students in NYC schools. However, there were very many limitations of this project. For this final project, I wanted to build upon with what I learned how to do this year with various visualization packages and connect it to the same dataset.

Below I will detail my summary conclusions:

To begin, I want to reiterate that SAT scores are not a complete measure of a school’s success. The reason why I chose to focus on the math SAT score is because I wanted to focus on one variable. I believe that an SAT score is just a number that represents a small fraction of what a school can do; however, the unfortunate reality is that colleges and employees use SAT scores as a beginning marker for a student’s potential. When a school or community does not have the resources to support the preparation for these high-stakes standardized test, students and schools are the ones that suffer in the long run. It is these scores that help admit students into colleges and beyond. And it is for these reasons that I am compelled to analyze this data.

I used leaflet to help me graph the range of SAT scores. With this package, I was able to immediately see the schools that were above the national average (green markers) and the schools that were below (red and orange markers). Leaflet allowed me to zoom in and out of each borough and see the groups of schools that were a cluster of reds/oranges or green. Even though this was a good way to identify the schools’ scores, I noticed I was not able to clearly locate an area with “the best” SAT scores. I had to rely on doing analysis to determine the number and percentage of school’s with scores higher than the national average.

I also found that these schools with the top SAT math scores were either technical, specialized. college preparatory, or private schools. This specialization could suggest that schools focus on providing resources that allow students to succeed in mathematics or other technical fields.

From my analysis, I found that Staten Island had the highest median and mean math SAT score (465.5, 486.2). However, there were only 10 accredited high schools in this area. With 69 schools in the dataset, Queens was the second borough with the highest median/mean math SAT Score (448, 486). Bronx had the lowest mean and median math SAT score (396.5, 404.35).

In addition to learning about the SAT math scores in the specific boroughs, I was also curious about the demographic of the schools with the highest and lowest SAT score in each borough. I found this data by isolating the school with the highest SAT Score and the lowest SAT score and then detailing the percentages. Although this analysis was informational, it was not complete.

In my analysis, a trend did show that the schools with the highest SAT score (Bronx High School of Science, Staten Island Technical High School, Stuyvesant High School, and Brooklyn Technical High School) had a demographic that was majority Asian and/or White, I am still not certain that this data is completely accurate because it doesn’t account for the students that did not take the SAT.

Finally, in my attempt to learn about the schools with the lowest scores, I isolated the schools in their respective boroughs. I found that many of the schools were multilingual or were designed to host immigrants (Pan American International High School at Monroe, Ralph R. McKee Career and Technical Education High School, Coalition School for Social, Pan American International High School at Queens, and Multicultural High School Brooklyn). These schools often had a demographic of either majority Hispanic or Black students. However, knowing the the context of a school can help put into perspective the SAT scores.

Future Research

In the future, I would like to continue to learn about SAT score trends and how data can be used to support the schools that need the most help. If we can use data to identify school districts that are falling behind on test scores or other resources, it is possible that local or state governments can provide funding or support.

Overall, this project only touched the surface of my inquires. Given the limited number of schools in each borough, my analysis was observational. However, I think the use of Leaflet and other summary analysis techniqued helped me to understand the geographic differences in SAT math test scores in New York City.