2014-15 To 2016-17 School- Level NYC Regents

Nicholas Schettini

March 11, 2018

2014-15 To 2016-17 School- Level NYC Regents

Libraries:

library(tidyverse)
library(readr)
library(knitr)
library(kableExtra)

Read Data:

school_data <- read_csv("2014-15_To_2016-17_School-_Level_NYC_Regents_Report_For_All_Variables.csv")

Viewing the data to see what variables we have. Using head function since there are over 212k rows.

head(as.tbl(school_data))
## # A tibble: 6 x 18
##   `School DBN` `School Name`   `School Type` `School Level` `Regents Exam`
##   <chr>        <chr>           <chr>         <chr>          <chr>         
## 1 01M034       P.S. 034 Frank~ General Acad~ K-8            Common Core A~
## 2 01M034       P.S. 034 Frank~ General Acad~ K-8            Living Enviro~
## 3 01M034       P.S. 034 Frank~ General Acad~ K-8            Living Enviro~
## 4 01M140       P.S. 140 Natha~ General Acad~ K-8            Common Core A~
## 5 01M140       P.S. 140 Natha~ General Acad~ K-8            Common Core A~
## 6 01M140       P.S. 140 Natha~ General Acad~ K-8            Living Enviro~
## # ... with 13 more variables: Year <int>, `Demographic Category` <chr>,
## #   `Demographic Variable` <chr>, `Total Tested` <int>, `Mean
## #   Score` <chr>, `Number Scoring Below 65` <chr>, `Percent Scoring Below
## #   65` <chr>, `Number Scoring 65 or Above` <chr>, `Percent Scoring 65 or
## #   Above` <chr>, `Number Scoring 80 or Above` <chr>, `Percent Scoring 80
## #   or Above` <chr>, `Number Scoring CR` <chr>, `Percent Scoring CR` <chr>

Looking at some of the variables, I notice that there are different schools, regents exams (living environment, common core, etc), demographics, and test score data. I use to be a science teacher in a public middle school, it would be interesting to look at science data.

unique(school_data$`Regents Exam`)
##  [1] "Common Core Algebra"             "Living Environment"             
##  [3] "Common Core English"             "Algebra2/Trigonometry"          
##  [5] "Common Core Algebra2"            "Common Core Geometry"           
##  [7] "English"                         "Geometry"                       
##  [9] "Global History and Geography"    "Integrated Algebra"             
## [11] "Physical Settings/Chemistry"     "Physical Settings/Earth Science"
## [13] "U.S. History and Government"     "Physical Settings/Physics"      
## [15] NA

Living environment was a subject I taught in middle school, so I’m going to break down the data by only living environment exams.

This dataset has various grade levels - I’m going to explore junior high/middle school.

unique(school_data$`School Level`)
## [1] "K-8"                             "High school"                    
## [3] "Junior High-Intermediate-Middle" "Secondary School"               
## [5] "K-12 all grades"                 "Elementary"

This data has different student demographics, it might be interesting to compare their overall scores.

unique(school_data$`Demographic Variable`) 
##  [1] "All Students"                            
##  [2] "SWD"                                     
##  [3] "Non-SWD"                                 
##  [4] "ELL"                                     
##  [5] "English Proficient"                      
##  [6] "Former ELL"                              
##  [7] "Male"                                    
##  [8] "Female"                                  
##  [9] "White"                                   
## [10] "Black"                                   
## [11] "Hispanic"                                
## [12] "Asian"                                   
## [13] "Multiple Race Categories Not Represented"

Looking through the different schools listed… there are too many to display, but it would be interesting if my old school is listed…

head(unique(school_data$`School Name`))
## [1] "P.S. 034 Franklin D. Roosevelt"       
## [2] "P.S. 140 Nathan Straus"               
## [3] "P.S. 184m Shuang Wen"                 
## [4] "P.S. 188 The Island School"           
## [5] "Orchard Collegiate Academy"           
## [6] "Technology, Arts, and Sciences Studio"

Lets see …

school_data %>%
  filter(`School Name` == "Marsh Avenue School for Expeditionary Learning")
## # A tibble: 83 x 18
##    `School DBN` `School Name`  `School Type` `School Level` `Regents Exam`
##    <chr>        <chr>          <chr>         <chr>          <chr>         
##  1 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ Common Core A~
##  2 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ Common Core A~
##  3 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ Common Core A~
##  4 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ Living Enviro~
##  5 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ Living Enviro~
##  6 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ U.S. History ~
##  7 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ U.S. History ~
##  8 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ U.S. History ~
##  9 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ Common Core A~
## 10 31R063       Marsh Avenue ~ General Acad~ Junior High-I~ Common Core A~
## # ... with 73 more rows, and 13 more variables: Year <int>, `Demographic
## #   Category` <chr>, `Demographic Variable` <chr>, `Total Tested` <int>,
## #   `Mean Score` <chr>, `Number Scoring Below 65` <chr>, `Percent Scoring
## #   Below 65` <chr>, `Number Scoring 65 or Above` <chr>, `Percent Scoring
## #   65 or Above` <chr>, `Number Scoring 80 or Above` <chr>, `Percent
## #   Scoring 80 or Above` <chr>, `Number Scoring CR` <chr>, `Percent
## #   Scoring CR` <chr>

Ok, now that I have some information on my dataset, lets start taking a deeper look.

I noticed that the numeric data is enetered at character type, so I need to fix that:

school_data1 <- school_data
school_data1$`Mean Score` <- as.double(school_data1$`Mean Score`)
## Warning: NAs introduced by coercion
school_data1$`Number Scoring Below 65` <- as.double(school_data1$`Number Scoring Below 65`)
## Warning: NAs introduced by coercion
school_data1$`Percent Scoring Below 65` <- as.double(school_data1$`Percent Scoring Below 65`)
## Warning: NAs introduced by coercion
school_data1$`Number Scoring 65 or Above` <- as.double(school_data1$`Number Scoring 65 or Above`)
## Warning: NAs introduced by coercion
school_data1$`Percent Scoring 65 or Above` <- as.double(school_data1$`Percent Scoring 65 or Above`)
## Warning: NAs introduced by coercion
school_data1$`Number Scoring 80 or Above` <- as.double(school_data1$`Number Scoring 80 or Above`)
## Warning: NAs introduced by coercion
school_data1$`Percent Scoring 80 or Above` <- as.double(school_data1$`Percent Scoring 80 or Above`)
## Warning: NAs introduced by coercion
head(school_data1)
## # A tibble: 6 x 18
##   `School DBN` `School Name`   `School Type` `School Level` `Regents Exam`
##   <chr>        <chr>           <chr>         <chr>          <chr>         
## 1 01M034       P.S. 034 Frank~ General Acad~ K-8            Common Core A~
## 2 01M034       P.S. 034 Frank~ General Acad~ K-8            Living Enviro~
## 3 01M034       P.S. 034 Frank~ General Acad~ K-8            Living Enviro~
## 4 01M140       P.S. 140 Natha~ General Acad~ K-8            Common Core A~
## 5 01M140       P.S. 140 Natha~ General Acad~ K-8            Common Core A~
## 6 01M140       P.S. 140 Natha~ General Acad~ K-8            Living Enviro~
## # ... with 13 more variables: Year <int>, `Demographic Category` <chr>,
## #   `Demographic Variable` <chr>, `Total Tested` <int>, `Mean
## #   Score` <dbl>, `Number Scoring Below 65` <dbl>, `Percent Scoring Below
## #   65` <dbl>, `Number Scoring 65 or Above` <dbl>, `Percent Scoring 65 or
## #   Above` <dbl>, `Number Scoring 80 or Above` <dbl>, `Percent Scoring 80
## #   or Above` <dbl>, `Number Scoring CR` <chr>, `Percent Scoring CR` <chr>

So the data has a lot of NA’s. This makes things … frustrating because some columns have more NA’s than others. So I either need to remove everything that has an NA or remove the NA’s when I want to work with those variables. I also notice that Number Scoring CR and Percent Scoring CR have much more NAs than the rest, I think I’m going to remove that completely.

science_data <- school_data1 %>%
  filter(`Regents Exam` == "Living Environment", `School Level` == c("Junior High-Intermediate-Middle", "Secondary School")) %>%
  select(`School Name`, `School Level`, `Regents Exam`, Year, `Demographic Variable`, `Total Tested`, `Mean Score`,`Number Scoring 65 or Above`, `Number Scoring 65 or Above`,
         `Number Scoring Below 65`, `Percent Scoring 80 or Above`, `Percent Scoring 80 or Above`, `Percent Scoring Below 65`)

science_data <- na.omit(science_data)  

Lets take a look at mean test scores for the living environment exam for middle school students relating to demographic variables.

science_data %>%
  group_by(`Demographic Variable`, `Mean Score`) %>%
  ggplot(aes(`Mean Score`)) + 
  geom_histogram(aes(fill = `Demographic Variable`)) +
  ggtitle("Living Environment - Middle School")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

science_data %>%
  group_by(`Demographic Variable`, `Mean Score`) %>%
  ggplot(aes(`Mean Score`, `Demographic Variable`)) + 
  geom_point(aes(fill = `Demographic Variable`),  alpha = 0.5) +
  ggtitle("Living Environment - Middle School")

science_data1 <- science_data %>%
  group_by(`Demographic Variable`) %>%
  summarize(mean(`Mean Score`))

kable(science_data1, "html", escape = F) %>%
  kable_styling("striped", full_width = T) %>%
  column_spec(1, bold = T) %>%
  row_spec(2, bold = T, color = "white", background = "green")  %>%
  row_spec(13, bold = T, color = "white", background = "green")  %>%
  row_spec(4, bold = T, color = "white", background = "red")
Demographic Variable mean(Mean Score)
All Students 73.64864
Asian 81.71721
Black 71.81100
ELL 55.63714
English Proficient 74.56829
Female 73.21759
Former ELL 76.12270
Hispanic 72.59544
Male 75.53846
Multiple Race Categories Not Represented 75.24500
Non-SWD 74.38908
SWD 60.45306
White 81.65682

Conclusion:

Looking at this data, it seems that the demographics: White and Asian have the highest mean living environment scores for middle school students. The lowest mean score are English Language Learners, with a 55.6 mean score. They are, on average, about 15 points lower than all of the others. It’s interesting that former ELL students are on average with other demographic groups.

The Living Environment exam is broken up into different sections, with a lot of reading comphrension. One seciton has a hands-on lab, where students must follow a set of instructions withouth any guidance from the teacher. This could be one reason why ELL students strugle with this exam.

I’m interested in knowing how their scores compare to a non-reading heavy exam - like Math. (Though now with the common core it is very reading heavy.)

Lets take a look!

So, math has many different types of exams, and the Living Environment is technically a 9th grade exam, however students can take it in 8th grade. Lets break down the intermediate algebra exam, which I believe is also a 9th grade exam that can be taken in middle school.

math_data <- school_data1 %>%
  filter(`Regents Exam` == "Integrated Algebra", `School Level` == c("Junior High-Intermediate-Middle", "Secondary School")) %>%
  select(`School Name`, `School Level`, `Regents Exam`, Year, `Demographic Variable`, `Total Tested`, `Mean Score`,`Number Scoring 65 or Above`, `Number Scoring 65 or Above`,
         `Number Scoring Below 65`, `Percent Scoring 80 or Above`, `Percent Scoring 80 or Above`, `Percent Scoring Below 65`)

math_data <- na.omit(math_data)  
math_data %>%
  group_by(`Demographic Variable`, `Mean Score`) %>%
  ggplot(aes(`Mean Score`)) + 
  geom_histogram(aes(fill = `Demographic Variable`)) +
  ggtitle("Intermediate Algebra - Middle School")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

math_data %>%
  group_by(`Demographic Variable`, `Mean Score`) %>%
  ggplot(aes(`Mean Score`, `Demographic Variable`)) + 
  geom_point(aes(fill = `Demographic Variable`),  alpha = 0.5) +
  ggtitle("Intermediate Algebra - Middle School")

math_data1 <- math_data %>%
  group_by(`Demographic Variable`) %>%
  summarize(mean(`Mean Score`))

kable(math_data1, "html", escape = F) %>%
  kable_styling("striped", full_width = T) %>%
  column_spec(1, bold = T) %>%
  row_spec(2, bold = T, color = "white", background = "green") %>%
  row_spec(11, bold = T, color = "white", background = "red") %>%
  row_spec(4, bold = T, color = "white", background = "lightblue")
Demographic Variable mean(Mean Score)
All Students 67.71231
Asian 76.13333
Black 65.85000
ELL 61.24375
English Proficient 67.97083
Female 67.71091
Former ELL 70.15714
Hispanic 67.10541
Male 67.82000
Non-SWD 68.57347
SWD 58.88000
White 73.11429

Conculsions:

It looks like ELL’s mean score has increased by about ~5. The mean score for Asian and White has also decreased by about ~5 and ~8, respectively. The lowest scoring group for the intermediate algbera is now the SWD - students with disabilities.

It’s also interesting to look at the spread of data in the point graphs. The math range of scores are much more spread when compared to the science scores.

Lets compare my old school to the whole NYC dataset.

former_school <- school_data1 %>%
  filter(`Regents Exam` == "Living Environment",  `School Name` == "Marsh Avenue School for Expeditionary Learning"  ) %>%
  select(`School Name`, `School Level`, `Regents Exam`, Year, `Demographic Variable`, `Total Tested`, `Mean Score`,`Number Scoring 65 or Above`, `Number Scoring 65 or Above`,
         `Number Scoring Below 65`, `Percent Scoring 80 or Above`, `Percent Scoring 80 or Above`, `Percent Scoring Below 65`)

former_school <- na.omit(former_school)  

former_school
## # A tibble: 11 x 11
##    `School Name`    `School Level`   `Regents Exam`  Year `Demographic Va~
##    <chr>            <chr>            <chr>          <int> <chr>           
##  1 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2016 All Students    
##  2 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2017 All Students    
##  3 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2017 Non-SWD         
##  4 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2016 Female          
##  5 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2016 Male            
##  6 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2017 Female          
##  7 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2017 Male            
##  8 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2016 Asian           
##  9 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2016 White           
## 10 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2017 Asian           
## 11 Marsh Avenue Sc~ Junior High-Int~ Living Enviro~  2017 White           
## # ... with 6 more variables: `Total Tested` <int>, `Mean Score` <dbl>,
## #   `Number Scoring 65 or Above` <dbl>, `Number Scoring Below 65` <dbl>,
## #   `Percent Scoring 80 or Above` <dbl>, `Percent Scoring Below 65` <dbl>
former_school %>%
  group_by(`Demographic Variable`, `Mean Score`) %>%
  ggplot(aes(`Mean Score`)) + 
  geom_histogram(aes(fill = `Demographic Variable`)) +
  ggtitle("Living Environment - Marsh Avenue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

former_school %>%
  group_by(`Demographic Variable`, `Mean Score`) %>%
  ggplot(aes(`Mean Score`, `Demographic Variable`)) + 
  geom_point( alpha = 0.5) +
  ggtitle("Living Environment - Marsh Avenue")

former_school %>%
  group_by(`Mean Score`) %>%
  ggplot(aes(`Mean Score`, `Demographic Variable`)) + 
  geom_density(aes(fill = `Demographic Variable`),  alpha = 0.5) +
  ggtitle("Living Environment - Marsh Avenue")

former_school1 <- former_school %>%
  group_by(`Demographic Variable`) %>%
  summarize(mean(`Mean Score`))

kable(former_school1, "html", escape = F) %>%
  kable_styling("striped", full_width = T) %>%
  column_spec(1, bold = T)
Demographic Variable mean(Mean Score)
All Students 83.30
Asian 87.30
Female 82.35
Male 84.45
Non-SWD 81.70
White 81.65

Based on the students that did take the Living environment regents, it seems that for ‘all students’, this school had a higher mean than the city average (81.57 vs. 83.30).