Introduction

The project analyzes the SAT scores of high school students in New York City (NYC), by different student demographics including race, gender, English language learning schools and AP results. The data is provided by the City of New York. The SAT, or Scholastic Aptitude Test, is a test that high school students take in the US before applying to college. Colleges take the test scores into account while making admission decisions, so it’s fairly important to do well on. The total SAT score is out of 2400 . High schools are often ranked by their average SAT scores, and high SAT scores are considered a sign of how good a school district is.

Through out this project we show how SAT scores vary and correlate with different parameters like gender, ethnicity and other demographic and socioeconomic conditions.

Supplementing the data

Data for this project are gathered from different sources supported by the City of New York. Here are the links to all of the datasets we’ll be using:

Data Processing

Before we start analyzing the data we need to understand the data. In this case, each link above has a description of the data, along with the relevant columns. It looks like we have data on the SAT scores of high schoolers, along with other datasets that contain demographic and other information. We can see some obvious patterns such as the fact that most of the datasets have the DBN column and some of the datasets have multiple rows for each school.

In order to more easily work with this data, we need to clean up the data and combine the various datasets by: filtering out duplicate rows, adding necessary columns, and splitting composite columns, etc.

Before we start showing the analysis of data, we’ll map out the locations of the schools, which will better help us understand the analysis results.

Enrollment and SAT scores

Now that we’ve set the context by plotting out where the schools are, we can start exploring how the different variables affect the SAT score. The first one is to explore the relationship between the number of students enrolled in a school and SAT score.

We can explore this with a scatter plot that compares total enrollments across all schools to SAT scores across all schools.

From the above graph we can clearly see there’s a cluster at the bottom left with low total enrollment and low SAT scores. Definitely there is no linear relationship between SAT score and Total Enrollment but there is a slight relation between them.

We can explore this further by getting the names of the schools with low enrollment and low SAT scores:

##                                          SCHOOL.NAME
## 123              INTERNATIONAL COMMUNITY HIGH SCHOOL
## 171              ACADEMY FOR LANGUAGE AND TECHNOLOGY
## 172                  BRONX INTERNATIONAL HIGH SCHOOL
## 186            KINGSBRIDGE INTERNATIONAL HIGH SCHOOL
## 189            INTERNATIONAL SCHOOL FOR LIBERAL ARTS
## 235 PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE
## 241                    HIGH SCHOOL OF WORLD CULTURES
## 252               BROOKLYN INTERNATIONAL HIGH SCHOOL
## 287                              PACIFIC HIGH SCHOOL
## 302    INTERNATIONAL HIGH SCHOOL AT PROSPECT HEIGHTS
## 318                       IT TAKES A VILLAGE ACADEMY
## 339                        MULTICULTURAL HIGH SCHOOL
## 373             ASPIRATIONS DIPLOMA PLUS HIGH SCHOOL
## 379           PAN AMERICAN INTERNATIONAL HIGH SCHOOL

The reason for low SAT score in these schools may be because these schools have ESL (English qs a Second Laguage) Focus i.e. some or all seats are reserved for English Language Learners (ELL). These are the schools which have the welcoming program to help new immigrants learn English.

Exploring English language learners and SAT scores

Above analysis shows that percentage of English language learners in a school is correlated with lower SAT scores, we can explore the relationship. The ell_percent column is the percentage of students in each school who are learning English. We can make a scatterplot of this relationship.

From the graph we can see that schools with a low proportion of ELL learners tend to have high SAT scores, and vice versa. It appears that there is a group of schools with a high ell_percentage and low average SAT score. List of schools which have SAT score less than 1000 and English Language Learners percentage greater than 80% are shown below.

##                                          SCHOOL.NAME
## 123              INTERNATIONAL COMMUNITY HIGH SCHOOL
## 171              ACADEMY FOR LANGUAGE AND TECHNOLOGY
## 172                  BRONX INTERNATIONAL HIGH SCHOOL
## 186            KINGSBRIDGE INTERNATIONAL HIGH SCHOOL
## 235 PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE
## 241                    HIGH SCHOOL OF WORLD CULTURES
## 252               BROOKLYN INTERNATIONAL HIGH SCHOOL
## 302    INTERNATIONAL HIGH SCHOOL AT PROSPECT HEIGHTS
## 339                        MULTICULTURAL HIGH SCHOOL
## 379           PAN AMERICAN INTERNATIONAL HIGH SCHOOL

Exploring Race and SAT scores

One of the important factor which contributes to this analysis is Race.

From the above graph it looks like the higher percentages of White and Asian students correlate with higher SAT scores, but higher percentages of Black and Hispanic students correlate with lower SAT scores.

Gender differences in SAT scores

Gender is the next variable in analysis. There is a corelation between Gender and SAT score, which we can visualize with a bar graph. Analysing data shows that a higher percentage of females in a school tends to correlate with higher SAT scores.


The graph below shows a more detailed look at this phenomenon.

##                                                              SCHOOL.NAME
## 9                                         BARD HIGH SCHOOL EARLY COLLEGE
## 29                              PROFESSIONAL PERFORMING ARTS HIGH SCHOOL
## 34                                         ELEANOR ROOSEVELT HIGH SCHOOL
## 85  FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & ART AND PERFORMING ARTS
## 398                                          TOWNSEND HARRIS HIGH SCHOOL

Above graph shows group of schools with a high percentage of females, and very high SAT scores (in the top right). School names are also listed. These schools tend to have higher percentages of females, and higher SAT scores. This likely accounts for the correlation between higher female percentages and SAT scores, and the inverse correlation between higher male percentages and lower SAT scores.

AP scores

Next is analyzing the relationship between students taking Advanced Placement exams and higher SAT scores. It makes sense that they would be correlated, since students who are high academic achievers tend to do better on the SAT.

##                         SCHOOL.NAME
## 49           STUYVESANT HIGH SCHOOL
## 200    BRONX HIGH SCHOOL OF SCIENCE
## 251  BROOKLYN TECHNICAL HIGH SCHOOL
## 402 BENJAMIN N. CARDOZO HIGH SCHOOL

Above graph shows group of schools with a high AP Average scope, ap_avg score, and very high SAT scores (in the top right). School names are also listed. BRONX HIGH SCHOOL OF SCIENCE is the Most Nobel prize-winning graduates of any school in the country. BROOKLYN TECHNICAL HIGH SCHOOL has challenging academics and lots of high tech, hands-on learning. The admission to most of these specialized High School is based solely on the score obtained on the Specialized High Schools Admission Test (SHSAT).

Conclusion

So we have now just taken a pretty thorough look at NYC public school SAT test scores along various demographic and socioeconomic dimensions. This perspective revealed to us a number of startling and stark trends about who is performing well at these tests and by how much. Certain subsets of the population are statistically likely to do much better than others. This could be coincidental or it could be that our standardized testing model optimizes for certain factors and is not quite as standard as we think. Further analysis could be conducted into breaking down these filters even further and maybe even combining them to derive further insights into the equity of standardized testing