New York School SAT Scores

This Data Visualization is to analyze the SAT scores from all the public high schools in the city of New York. The data source is as follows.

Data Source : https://www.kaggle.com/nycopendata/high-schools

The dataset consists of a row for every accredited high school in New York City along with its borough, latitude/longitude coordinates, student enrollment data, race breakdown and average SAT scores for the 2014 - 2015 school year.

knitr::include_graphics("C:/Users/Shafi Syed/OneDrive/Desktop/doe-logo.jpg")



import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/Users/Shafi Syed/anaconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import folium
import seaborn as sns
import warnings

warnings.filterwarnings("ignore")

path = "C:/Users/Shafi Syed/OneDrive/Desktop/"

filename = path + 'PythonNY Schools (1).csv'

df = pd.read_csv(filename)

import_cols = ['Borough', 'Latitude','Longitude','Zip Code', 'Average Score (SAT Math)']

map_df= pd.read_csv(filename, usecols = import_cols)
map_df=map_df.dropna()
map_df=map_df.reset_index(drop=True)


bor_df = map_df.groupby(['Borough']).size().reset_index(name="Count")

Average SAT Scores by NY Boroughs

Let us analyze the average SAT scores for Math, Reading and Writing per each of the Boroughs in New York City from the dataset. Please refer to the below fig. - Avg. SAT Scores by NY Boroughs.

From the visualization we can interpret that Staten Island had the highest average SAT score among all the Boroughs, while Bronx had the lowest. One noticeable data here is that for all the Boroughs, the math average SAT score stood out more than Reading and Writing. This demonstrates the focus math is getting from the schools and the students, especially in the financial capital of the World. We will review in detail the Math SAT score among all the NY Schools in the next visualization. Also it is important to note that Staten Island has the lowest number of schools than Bronx as seen on the attached html - Map of NY schools.


import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/Users/Shafi Syed/anaconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import folium
import seaborn as sns
import warnings

warnings.filterwarnings("ignore")

df = df.dropna()
SATAVG_df = df.groupby(['Borough']).agg({'Average Score (SAT Math)':['mean'], 'Average Score (SAT Reading)':['mean'],'Average Score (SAT Writing)':['mean']}).reset_index()
SATAVG_df.columns = ['Borough', 'Math', 'Reading', 'Writing']


fig = plt.figure(figsize=(20,5))
SATAVG_df.plot(x="Borough", y= ["Math", "Reading", "Writing"], kind ="bar")
plt.ylabel('SAT scores', fontsize =10,labelpad=10)
plt.xlabel('Boroughs', fontsize =10,labelpad=1.5)
plt.title(' Avg. SAT Scores by NY Boroughs', fontsize= 16)
plt.yticks(fontsize=8)
(array([  0., 100., 200., 300., 400., 500., 600.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
plt.xticks(fontsize=6, rotation=25)
(array([0, 1, 2, 3, 4]), [Text(0, 0, 'Bronx'), Text(1, 0, 'Brooklyn'), Text(2, 0, 'Manhattan'), Text(3, 0, 'Queens'), Text(4, 0, 'Staten Island')])
plt.legend(loc='upper right', fontsize=8, bbox_to_anchor=(1.12, 0.5))

plt.show()

Math SAT Score in NY Schools

The following map shows schools that have over 400 as SAT score for Math (denoted by green) and less than 400 as SAT score for Math (denoted by red). We could also see the SAT scores for each school as we move the pointer on the map.

The data shows that Manhattan has more schools with over 400 Math SAT score and Bronx and Brooklyn has more schools with lower Math score.

This data tells us that the NYC Dept of Education should increase attention on the Math subject in the boroughs of Bronx and Brooklyn. This will help STEM aspirants and level the playing field when it comes to future opportunities.

*Note: Select Cartodbpositron from the top left of the map to apply the background layer

Demography of Students in NY High Schools (2014-2015)

As shown in the following plot (Demographic of Students in NY Boroughs), we notice that Queens has a higher percentage of students of Asian background. This correlates to the higher population of mostly Chinese Americans in Queens. Staten Island had the most white students among the Boroughs. Brooklyn had the most Black students and Bronx had the most hispanic students among its population. We can also see that Queens seems to be the most cosmopolitan and diverse among the boroughs, having almost equal percentage of students from each race.


Newdf = df.copy(deep=True)
Newdf["Percent White"] = pd.to_numeric(Newdf["Percent White"].str.replace('%', ''), errors='coerce')
Newdf["Percent Black"] = pd.to_numeric(Newdf["Percent Black"].str.replace('%', ''), errors='coerce')
Newdf["Percent Hispanic"] = pd.to_numeric(Newdf["Percent Hispanic"].str.replace('%', ''), errors='coerce')
Newdf["Percent Asian"] = pd.to_numeric(Newdf["Percent Asian"].str.replace('%', ''), errors='coerce')
Newdf["Percent Tested"]=pd.to_numeric(Newdf["Percent Tested"].str.replace('%', ''), errors='coerce')

stacked_df = Newdf.groupby(['Borough']).agg({'Percent White':['mean'],'Percent Black':['mean'], 'Percent Hispanic':['mean'],'Percent Asian':['mean']}).reset_index()
stacked_df.columns = ['Borough', 'White', 'Black', 'Hispanic', 'Asian']

fig = plt.figure(figsize=(20,20))
ax= fig.add_subplot(1,1,1)
stacked_df.plot(kind= 'bar', stacked=True,ax=ax, width=0.2, align='center')

plt.ylabel('% of Students', fontsize =25,labelpad=10)
plt.title('Demographic of Students in NY Boroughs', fontsize=25)
plt.xticks(range(5), ['Bronx','Brooklyn','Manhattan','Queens','Staten Island'], rotation=90, horizontalalignment='center',fontsize=18)
## ([<matplotlib.axis.XTick object at 0x000000005F54A1F0>, <matplotlib.axis.XTick object at 0x000000005F54A1C0>, <matplotlib.axis.XTick object at 0x000000006106F700>, <matplotlib.axis.XTick object at 0x00000000610B56A0>, <matplotlib.axis.XTick object at 0x00000000610B5A00>], [Text(0, 0, 'Bronx'), Text(1, 0, 'Brooklyn'), Text(2, 0, 'Manhattan'), Text(3, 0, 'Queens'), Text(4, 0, 'Staten Island')])
plt.yticks(fontsize=18)
## (array([  0.,  20.,  40.,  60.,  80., 100., 120.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
ax.set_xlabel('New York Boroughs', fontsize=25)

handles, labels = ax.get_legend_handles_labels()
handles = [handles [3],handles [2],handles [1],handles [0]]
labels = [labels [3],labels [2],labels [1],labels [0]]
plt.legend(handles,labels,loc='upper right', fontsize=14, bbox_to_anchor=(1.15, 0.5))


plt.show()

Mean of Percentage of Students Tested

The next visualization shows the mean of percentage of students tested in each borough. Staten Island, Queens, Manhattan had above mean of students tested, while Brooklyn and Bronx fell below the mean. Manhattan had the most number of students tested.


hordf= Newdf.groupby(['Borough']).agg({'Percent Tested':['mean'],}).reset_index()
hordf.columns = ['Borough', 'Percent Tested']

mean = hordf['Percent Tested'].mean()
fig = plt.figure(figsize=(10,10))
ax1= hordf.plot.barh(x='Borough', y='Percent Tested', color= 'purple')
ax1.axvline(mean, color = 'red')
plt.text(mean, .5, 'mean')


ax1.set_title('Mean of Percentage of Students Tested',size=16)
ax1.set_xlabel('Percentage of Students', fontsize=12)
plt.xticks(fontsize=8)
## (array([ 0., 10., 20., 30., 40., 50., 60., 70., 80.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
plt.yticks(fontsize=8)
## (array([0, 1, 2, 3, 4]), [Text(0, 0, 'Bronx'), Text(0, 1, 'Brooklyn'), Text(0, 2, 'Manhattan'), Text(0, 3, 'Queens'), Text(0, 4, 'Staten Island')])
ax1.legend().set_visible(False)


plt.show()

Students Enrollment by NY Boroughs

The following chart shows the number of studnets enrolled by each New York City Borough. Brooklyn had the highest, while Staten Island had the lowest enrollment, comparatively. This is also dependent on the population density and is directly proportional to it. Obviously, since Brooklyn has the highest population and Staten Island the lowest, the below data is in line with the expectation. NY City schools had a total of 282,916 high school studnets enrolled in the academic year of 2014-2015.

pie_df= Newdf.groupby(['Borough']).agg({'Student Enrollment':['sum'],}).reset_index()
pie_df.columns = ['Borough', 'StudentEnrollment']


number_outside_colors = len(pie_df.Borough.unique())
outside_color_ref_number = np.arange(number_outside_colors)*4

fig = plt.figure(figsize=(10,10))
ax= fig.add_subplot(1,1,1)

colormap= plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)

all_students = pie_df.StudentEnrollment.sum()

pie_df.groupby(['Borough'])['StudentEnrollment'].sum().plot(
    kind='pie', radius =1,colors = outer_colors, pctdistance = 0.85, labeldistance = 1.1,
    wedgeprops = dict(edgecolor='w'), textprops = {'fontsize':18},
    autopct = lambda p: '{:.2f}%\n({:.0f})'. format(p, (p/100)*all_students),
    startangle = 90)

hole = plt.Circle((0,0), 0.3, fc ='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Students Enrollment by NY Boroughs', fontsize =20)

ax.text(0,0, 'Total Students\n' + str(round(all_students)),size = 18, ha='center', va= 'center')

ax.axis('equal')
## (-1.1076942823998273, 1.10751548381386, -1.122291398196357, 1.1010614985061504)
plt.tight_layout()


plt.show()

Conclusion

The data shows that NY schools have a good overall SAT scores when compared to the national average. However, there are pockets in certain boroughs as shown above, where more focus on certain subjects can improve the overall academics in the high school. The boroughs with more hispanic and black population need more focus , investments from the dept of Education.

knitr::include_graphics("C:/Users/Shafi Syed/OneDrive/Desktop/Loyola_University_Maryland_shield.png")
This report was developed for academic purpose only as part of GB 736, MBA Programme at Loyola Univeristy Maryland

This report was developed for academic purpose only as part of GB 736, MBA Programme at Loyola Univeristy Maryland