Synopsis:

Considering the importance of education in an increasingly knowledge based economy, I performed an exploratory data analysis of school performance in relation to various attributes, with the following objectives.


Scope, Variables and Datasets:

Analysis was restricted to NYC puclic schools ( comprising 32 school districts)


Factors considered:

SAT score covering Math , Reading and Writing was used as an indicator of school performance.

Following datasets were used for the analysis

  1. School Attendance File
  2. Class size File
  3. Demographics File
  4. School Safety report
  5. 2010 SAT score file
  6. 2014 SAT score file


Pre-Processing:

## integer(0)
##             size  attendance.rate   total.students            crime 
##         2.400599         3.398773         1.832385         3.362739 
##     female.ratio disability.ratio english.learners    poverty.ratio 
##         1.167576         1.633770         2.429917         2.469300


Feature Selection:

regsubsets was for used feature selection.

The following 2 features out of the total 8 feature, were picked up by regsubsets as features that have some influence on SAT scores
Class size
Poverty.

##   (Intercept)          size poverty.ratio 
##   0.527462034   0.008558815  -0.263718593



To cross validate, feature selection process was repeated with leaps::steps - the same 2 features were picked up by steps function as well.

##                 Estimate Std. Error   t value    Pr(>|t|)
## (Intercept)   -4.4799063 1.55993754 -2.871850 0.007693257
## poverty.ratio -0.4308900 0.14733625 -2.924535 0.006764502
## size           0.1759099 0.06107862  2.880057 0.007541046
## female.ratio   0.1954207 0.12840657  1.521890 0.139251166


Influence of Poverty and Class size over school performance:

I decided to take a closer look on the impact of these two key attributes on the overall performance.
When a regression plot on school performance was plotted against class size and poverty, the result was a surprise.



While the influence of poverty on SAT scores was in line with the expectation (increased poverty rates result in decreased scores), the impact of class size was totally unexpected.
The trend line shows performance declining with smaller class sizes.

Taking a second look at these plots , the impact of class size over school performance looks like almost a mirror image of poverty plot. I wanted to understand the relation between these two factors. What I found was really interesting.


##                Estimate Std. Error  t value     Pr(>|t|)
## (Intercept)   25.467047  0.3636255 70.03647 8.286524e-35
## poverty.ratio -1.152746  0.3694439 -3.12022 3.974160e-03


Clearly, most of the schools in poorer neighborhoods have smaller class sizes than the school districts that are better off.

I searched for explanation and came across “Contracts for Excellence Legislation” , which was possibly the confounding factor.

This legislation funded a set of initiatives over from 2007-08, including reduction of class size, focused on poor neighborhoods and schools performing below state standards.

Our finding that most of the poor neighborhood / low performance schools have comparatively smaller class sizes may be explained by action taken under the legislation since 2007 as reflected in the 2010 class size data.


This background presents an interesting question to explore.

If NYC Department of Education is funding this class size reduction initiative to improve school performance - have smaller class sizes really helped to improve the performance of these target schools compared to those with bigger class sizes?

Let us take a look at performance change of class size groups by comparing total 2014 scores against 2010 by class size group .

Looking at the overlap in notches in the box plot across size groups , there seems to be no significant improvement in scores for the classes with smaller sizes.

It might also be worthwhile to check if there are any improvements in subject level score for individual schools (Math , Reading and Writing) , which could potentially have been lost while analyzing district level total scores above..


Change in Math performance by class size group:

Linear model and plot comparing the change in math score (2014 vs 2010) is shown below. There is no statistically significant improvement in the math performance of schools with smaller classes over this period.

##                  Estimate  Std. Error     t value  Pr(>|t|)
## (Intercept)  0.0093059876 0.110879575  0.08392878 0.9331615
## math.size   -0.0003821818 0.004496326 -0.08499869 0.9323115



Change in Reading performance by class size group:

Linear model and plot comparing the change in reading score (2014 vs 2010) is shown below. There is no statistically significant improvement in the reading performance of schools with smaller classes over this period.

##                  Estimate  Std. Error    t value  Pr(>|t|)
## (Intercept)  -0.112575543 0.112918641 -0.9969615 0.3194779
## english.size  0.004623194 0.004572386  1.0111120 0.3126673


Change in Writing performance by class size group:

Linear model and plot comparing the change in writing score (2014 vs 2010) is shown below. There is no statistically significant improvement in the writing performance of schools with smaller classes over this period.

##                  Estimate  Std. Error   t value  Pr(>|t|)
## (Intercept)  -0.145803992 0.106369693 -1.370729 0.1713455
## english.size  0.005987803 0.004307201  1.390184 0.1653640

After a detailed analysis of the change in score trends by class size , there is no confirmation that the smaller class sizes have resulted in a statistically significant improvement in student performance over a three year period.


Conclusion:


References:

Github Source Code