Intro:

This data set was taken from Tora’s discussion on the week 5 discussion board. This data set focuses on the NYCDOE Data School Quality Report for the school year of 2020-2021.

When we observe the data we can see that there is a lot of columns in this dataset fornatuely we can choose the relevant columns in order to answer the analysis question from Tora

## Using Tora's dataset to analyze the data
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
school <- read.csv("https://raw.githubusercontent.com/AldataSci/Project2-Data607/main/2020-2021_School_Quality_Reports_-_Early_Childhood_Schools.csv",header=TRUE,sep=",") 

1. Compare the average student attendance (in-person) to the number of experience with the teachers

Well this is unfortunate, most of the dataset is filled with empty data and thus the analysis will not be as insightful as I thought it would be. Aggregating the dataset by the teacher’s experience and arranging it by the attendace we get returned a lot of empty data in that column.

## Question: Compare the average student attendance (in-person) to the number of experience with the teachers

school %>%
  select(dbn,school_name,Teach_3_more_exp,n_attendance_inperson_k3_all) %>%
  group_by(Teach_3_more_exp) %>%
  arrange(desc(n_attendance_inperson_k3_all))
## # A tibble: 52 x 4
## # Groups:   Teach_3_more_exp [34]
##    dbn    school_name                          Teach_3_more_exp n_attendance_in~
##    <chr>  <chr>                                <chr>            <chr>           
##  1 84K910 Cypress Hills Ascend Charter School  0.6              No Data         
##  2 84K927 Ivy Hill Preparatory Charter School  0.53             No Data         
##  3 84K929 East Brooklyn Ascend Charter School  0.75             No Data         
##  4 84K930 LEEP Dual Language Academy Charter ~ No Data          No Data         
##  5 84K933 Lefferts Gardens Ascend Charter Sch~ 0.9              No Data         
##  6 84K934 East Flatbush Ascend Charter School  0.88             No Data         
##  7 84K937 Brooklyn RISE Charter School         No Data          No Data         
##  8 84M373 Zeta Charter Schools - Inwood 1      No Data          No Data         
##  9 84M380 Harlem Village Academy West 2 Chart~ 0.78             No Data         
## 10 84M383 Storefront Academy Harlem Charter S~ 0.6              No Data         
## # ... with 42 more rows

Filtering Out the No-Data

Here I filtered all the no data from the teachers greater than 3 years of experience and avg in person attendance to get some semblance of understanding. Aggregating and arranging the data from teacher’s experience after filtering out the No Datas. we can see that there is a positive correlation between teacher’s experience and number of in-person attendance. In P.S28 100% of the teachers had teaching experience greater than 3 years and that yielded an average of in-person attendance to 278 students. On the other side, PSX088 which is at the other end of the graph 11% percent of teachers had 3 years of experience and had an average of 11 students attending school in person.This means that teachers with a lot of experience had a greater proficiency of retaining kids to in-person classes.

school %>%
  select(dbn,school_name,Teach_3_more_exp,n_attendance_inperson_k3_all) %>%
  group_by(Teach_3_more_exp) %>%
  arrange(desc(Teach_3_more_exp)) %>%
  filter(n_attendance_inperson_k3_all != "No Data")
## # A tibble: 22 x 4
## # Groups:   Teach_3_more_exp [20]
##    dbn    school_name                          Teach_3_more_exp n_attendance_in~
##    <chr>  <chr>                                <chr>            <chr>           
##  1 24Q028 P.S. 28 - The Thomas Emanuel Early ~ 1                278             
##  2 25Q242 P.S. 242 Leonard P. Stavisky Early ~ 0.969            114             
##  3 25Q244 The Active Learning Elementary Scho~ 0.968            159             
##  4 30Q228 The Ivan Lafayette Early Childhood ~ 0.962            154             
##  5 22K326 P.S. 326                             0.933            74              
##  6 26Q376 P.S. 376                             0.917            141             
##  7 09X170 P.S. 170                             0.905            118             
##  8 30Q222 P.S. Q222 - Fire Fighter Christophe~ 0.895            148             
##  9 04M112 P.S. 112 Jose Celso Barbosa          0.871            138             
## 10 27Q051 P.S. 051                             0.824            107             
## # ... with 12 more rows

2. Movement of students with disablities to less restrictive enviroments vs. Race of the student

In this question I decided to pull out the columns of the school-name,n_Ire_all(movement of students with disabilities to less restrictive environments) and the ethnicity of each race group that attends theses schools. I aggregated the data by n_Ire_all and I arranged in descending order. Here we can see in P.S 112 92 disabled students were moved to less restrictive environments with the majority of those transports being Hispanic students followed by black students.

school %>%
  select(dbn,school_name,n_lre_all,ethnicity_asian_pct:ethnicity_white_pct) %>%
  group_by(n_lre_all) %>%
  arrange(desc(n_lre_all))
## # A tibble: 52 x 9
## # Groups:   n_lre_all [36]
##    dbn    school_name                n_lre_all ethnicity_asian~ ethnicity_black~
##    <chr>  <chr>                          <int>            <dbl>            <dbl>
##  1 04M112 P.S. 112 Jose Celso Barbo~        92            0.013            0.206
##  2 84R076 Bridge Preparatory Charte~        90            0.006            0.293
##  3 24Q028 P.S. 28 - The Thomas Eman~        74            0.062            0    
##  4 84M373 Zeta Charter Schools - In~        51            0.005            0.171
##  5 08X583 P.S. 583                          49            0.399            0.071
##  6 84X586 Brilla Veritas Charter Sc~        44            0.003            0.279
##  7 30Q228 The Ivan Lafayette Early ~        42            0.069            0    
##  8 84X609 Zeta Charter Schools - Br~        42            0.006            0.37 
##  9 84X623 Neighborhood Charter Scho~        42            0.009            0.277
## 10 25Q242 P.S. 242 Leonard P. Stavi~        41            0.762            0.036
## # ... with 42 more rows, and 4 more variables: ethnicity_hispanic_pct <dbl>,
## #   ethnicity_amerindian_pct <dbl>, ethnicity_pacific_pct <dbl>,
## #   ethnicity_white_pct <dbl>

3. Economic Need Index” vs “Number of Remote Learning Days

To answer this question I needed some to understand what an economic need index is acoording to google an economic need index estimates the percentage of students facing economic hardships. With a score of 1.0 being student is elgible for public assistance,lived in a temporary housing in the past few years or has entered nycdoe for the first time and speaks a language other than English.

Calculating the Days of Remote Learning

Now this column doesn’t exist in the data set so I created a new column called percentage of days that were remote learning. To calculate this I simply mutated the quotient of the avg attendance of students that attended school remotely divided by the total average student attendance to get this value and I put it in a dataframe called School.

## Mutating a new column to the dataframe School
School <- school %>%
  select(dbn,school_name,eni_pct,n_attendance_remote_k3_all,n_attendance_k3_all) %>%
  mutate(percentage_of_remote_learning_days = n_attendance_remote_k3_all/n_attendance_k3_all)

Making sense of the data

According to Google The DoE considers a school to be economically stratified if its economic need is more than 10 percentage points from the citywide average, in either direction. If a school is more than 10 percentage points above the citywide average, it is skewed toward lower incomes; if a school is more than 10 percentage points below the citywide average, it is skewed toward higher incomes.In NYC the average eni_pct is 0.71 Using this information in mind we can deduce that the higher the economic need index were the more students that were taking classes remotely while those with a lower need attended school remotely a bit less frequently.

School %>%
  select(dbn,school_name,eni_pct,percentage_of_remote_learning_days) %>%
  group_by(eni_pct) %>%
  arrange(desc(percentage_of_remote_learning_days))
## # A tibble: 52 x 4
## # Groups:   eni_pct [51]
##    dbn    school_name                                   eni_pct percentage_of_r~
##    <chr>  <chr>                                           <dbl>            <dbl>
##  1 84X630 Girls Preparatory Charter School of the Bron~   0.908             2.89
##  2 84X632 Zeta Charter School - Bronx 2                   0.923             1.49
##  3 84X633 Wildflower New York Charter School              1                 1.43
##  4 84X631 Zeta Charter School - Bronx 3                   0.86              1.34
##  5 84X609 Zeta Charter Schools - Bronx 1                  0.887             1.17
##  6 84K930 LEEP Dual Language Academy Charter School       0.793             1.09
##  7 84K927 Ivy Hill Preparatory Charter School             0.735             1.08
##  8 84Q414 Renaissance Charter School 2                    0.685             1.08
##  9 84X619 Bronx Arts and Science Charter School           0.77              1.07
## 10 84X599 Cardinal McCloskey Community Charter School     0.889             1.07
## # ... with 42 more rows

Conclusion:

I think this was a good data set to play around with tidyr and diplyr I was really interested in the dataset Tora had presented in the discussion board mainly because I had a younger brother who has ADHD and the results of the data to me were mainly unsurpising when answering Tora’s analysis questions. For instance, teachers that were rather newly-minted or had less experience usually had students that had low attendace rates while students with disablity usually minorites like Hispanic and Blacks were the ones that got transferred a lot. I may play around a bit more with this data because it seems really interesting to fiddle around with the data and gain some even more insights. But Tora’s questions were a good place to start analyzing key relationships between ethnicity,income and remote vs.in-person.