Project

Author

Asher Scott

For my project I used a data set from the National Center for Education Statistics that listed information on student enrollment: how many children were in each grade for each school, districts: which school district each school belonged to, student teacher ratio, address: where the school was located and more. I decided to focus on Louisiana because I was curious in seeing the differences in rural and city enrollment. I chose Lousisiana specifically because I wanted a state that had a good ratio of both as to make sure my visualization would be interesting. Majority of my time on this project consisted of cleaning: filtering out what variables I wanted, converting formats and making sure to remove all the NAs (there was a lot)

From lines 11-15 I load in the appropriate libraries and call in my Public School dataset

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(highcharter)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use

library(stringr)
setwd("/Users/asherscott/Desktop/Data 110")
PS<- read_csv("Public_School_2017-18 .csv")

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)

Rows: 100729 Columns: 79
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (21): SURVYEAR, STABR, ST_LEAID, LEA_NAME, SCH_NAME, LSTREET1, LSTREET2,...
dbl (55): X, Y, OBJECTID, NCESSCH, LEAID, LZIP, LZIP4, STATUS, TOTFRL, FRELC...
lgl  (3): LSTREET3, G13, AE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

From lines 19-23 I select 25 out the original 79 variables I plan on using in throughout my code and call it PS1. I then summon a table to compare how many observations each state/territory have, and I end up choosing to go with Louisiana (LA)

PS1 <- PS %>% 
  select(LATCOD, LONCOD, STABR, LEA_NAME, SCH_NAME, LSTREET1, LCITY, LSTATE, SCHOOL_TYPE_TEXT, SY_STATUS_TEXT, ULOCALE, G01, G02, G03, G04, G05, G06, G07, G08, G09, G10, G11, G12, TOTAL, STUTERATIO)
table(PS1$STABR)


   AK    AL    AR    AS    AZ    BI    CA    CO    CT    DC    DE    FL    GA 
  513  1478  1087    28  2414   174 10323  1900  1031   228   229  4375  2307 
   GU    HI    IA    ID    IL    IN    KS    KY    LA    MA    MD    ME    MI 
   41   294  1332   756  4245  1924  1319  1534  1390  1854  1420   599  3734 
   MN    MO    MS    MT    NC    ND    NE    NH    NJ    NM    NV    NY    OH 
 2558  2434  1060   822  2691   525  1095   495  2595   884   734  4808  3610 
   OK    OR    PA    PR    RI    SC    SD    TN    TX    UT    VA    VI    VT 
 1800  1249  2990  1121   317  1264   703  1782  9320  1066  2124    28   311 
   WA    WI    WV    WY 
 2427  2283   735   369

From lines 27-34 I use the filter string detect to select only Louisiana, and then I filter again to remove any NAs from the grades and Student teacher Ratio. I excluded Kindergarten and Pre-K becuase most of their listings were just 0 and didn’t add anything. I also summon two more tables because I wanted to get a better understanding for the ranges of locale as well as student teacher ratio.

PS2 <- PS1 %>% 
  filter(str_detect(STABR, "LA"))
PS3 <- PS2 %>%
  filter(if_all(c(G01, G02, G03, G04, G05, G06, G07, G08, G09, G10, G11, G12, STUTERATIO), ~ !is.na(.)))
table(PS3$ULOCALE)


     11-City: Large   12-City: Mid-size      13-City: Small    21-Suburb: Large 
                  3                   6                   6                   3 
22-Suburb: Mid-size    32-Town: Distant    41-Rural: Fringe   42-Rural: Distant 
                  1                   2                  10                  33 
   43-Rural: Remote 
                 22

table(PS3$STUTERATIO)


 3.89   4.9  5.36  5.45  6.17  6.31  7.47  7.82  8.39  8.89  9.43  9.45  9.47 
    1     1     1     1     1     1     1     1     1     1     1     1     1 
10.47 10.67 11.54 11.87  12.5 12.71 12.74 12.99    13 13.06 13.17 13.22 13.39 
    1     1     1     1     1     2     1     1     1     1     1     1     1 
13.41 13.45 13.63  13.9    14 14.05  14.1 14.14 14.36 14.38  14.5 14.55 14.61 
    1     1     1     1     1     1     1     1     1     1     1     1     1 
14.93 15.17 15.35 15.46 15.51 15.84 15.85 15.95 16.08 16.33 16.46 16.64 16.65 
    1     1     1     1     1     1     1     1     1     1     1     1     1 
16.78 16.86    17 17.19 17.23 17.47 17.54 17.74 17.83 17.95 18.15 18.53 18.56 
    1     2     1     1     1     1     1     1     1     1     1     1     1 
18.81 19.05 19.13 19.46 19.58 19.91 20.03 20.37    21 21.09 21.95    29 30.65 
    2     1     1     1     2     1     1     1     1     1     1     1     1 
32.29    37 41.48    84 
    1     1     1     1

From lines 44-59 I use the function mutate to create an additional column called “Location” that assigns each row to either Rural, City, Town, or Suburban (Note: almost all suburban schools were listed as alternative schools or juvenile centers). I then Pivoted to long format so highcharter could read my data more effectively

PS4 <- PS3 %>%
  mutate(Location = case_when(
    ULOCALE %in% c("11-City: Large", "12-City: Mid-size", "13-City: Small") ~ "City",
    ULOCALE %in% c("21-Suburb: Large", "22-Suburb: Mid-size") ~ "Suburban",
    ULOCALE == "32-Town: Distant" ~ "Town",
    ULOCALE %in% c("41-Rural: Fringe", "42-Rural: Distant", "43-Rural: Remote") ~ "Rural",
    TRUE ~ "Unknown"))
PS5 <- PS4 %>%
  pivot_longer(
    cols = G01:G12,            
    names_to = "Grade_Level",    
    values_to = "Student_Count")
lm_model <- lm(STUTERATIO ~ Student_Count, data = PS5)
summary(lm_model)


Call:
lm(formula = STUTERATIO ~ Student_Count, data = PS5)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.884  -3.544  -1.214   1.548  69.466 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   14.213596   0.414284  34.309  < 2e-16 ***
Student_Count  0.064013   0.008429   7.594 6.93e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.335 on 1030 degrees of freedom
Multiple R-squared:  0.05302,   Adjusted R-squared:  0.0521 
F-statistic: 57.67 on 1 and 1030 DF,  p-value: 6.933e-14

From lines 63-67 I made table to see which cities i could potentially use for my graph. I choose three cities that all had the same amount of observations (12) and fell into one of the three categories: rural, City, Town.

table(PS5$LCITY)


   Alexandria       Atlanta       Bastrop   Baton Rouge     Bell City 
           12            12            12            60            12 
     Bogalusa  Bossier City   Braithwaite   Bridge City        Calvin 
           12            12            12            12            12 
      Cameron        Castor   Centerville      Converse     Coushatta 
           12            12            12            12            12 
        Delhi        Dodson    Downsville       Doyline     Elizabeth 
           12            12            12            12            12 
        Elmer          Epps         Evans   Farmerville      Ferriday 
           12            12            12            12            12 
      Florien        Forest    Georgetown      Gibsland      Glenmora 
           12            12            12            12            24 
    Grambling Grand Chenier    Grand Isle         Grant     Hackberry 
           12            12            12            12            12 
      Hammond       Harahan  Harrisonburg        Holden      Hornbeck 
           12            12            12            12            12 
        Houma     Jefferson      Jennings     Jonesboro    Jonesville 
           12            12            12            12            12 
    Kilbourne     Lacassine     Lafayette  Lake Charles     Leesville 
           12            12            12            12            12 
         Lena    Logansport    Mandeville       Mansura      Maurepas 
           12            24            12            12            12 
   Merryville        Monroe      Monterey    Mt. Hermon       Negreet 
           12            24            12            12            12 
  New Orleans         Noble        Pitkin Pleasant Hill       Quitman 
           36            12            12            12            12 
       Reeves        Saline Sicily Island       Simpson      Simsboro 
           12            12            12            12            12 
       Singer   St. Gabriel        Starks       Sulphur   Summerfield 
           12            12            12            12            24 
    Thibodaux 
           12

PS7 <- PS5 %>%
  filter(LEA_NAME %in% c("Iberville Parish", "Lincoln Preparatory School", "Lafourche Parish"))

Form lines 72-91 I made line chart using high charter that displays the Rural, City, and Town student counts along grades 1st-12th. When you hover over each point it will list the grade count and then the Student-Teacher ratio. The data set already came with the student teacher ratio aggregated, so it doesn’t change.

PS7_chart <- highchart() %>%
  hc_add_series(data = PS7,
                type = "line",
                hcaes(x = Grade_Level,
                      y = Student_Count,
                      group = Location)) %>%
  hc_colors(c("purple", "navy", "darkorange", "hotpink")) %>%
  hc_xAxis(type = "category", 
           title = list(text="Grade Level")) %>%
  hc_yAxis(title = list(text="Student Count")) %>%
  hc_plotOptions(series = list(marker = list(symbol = "circle"))) %>%
  hc_legend(align = "right",
            verticalAlign = "top") %>%
  hc_tooltip(shared = TRUE,
             borderColor = "black",
             pointFormat = "{point.Location}:{point.Student_Count:.2f}, {point.STUTERATIO:.2f}<br>") %>%
  hc_caption(text = "Source: National Center for Education Statistics")
PS7_chart

From lines 95-106 I created a popup the list the school name, street, locale, Student Teacher ratio for that school and total number of students

PS6 <- PS5 %>%
  rename(latitude = LATCOD, longitude = LONCOD)

Publicpop <- paste0(
  "<b>School: </b>", PS6$SCH_NAME, "<br>",
  "<b>Street: </b>", PS6$LSTREET1, "<br>",
  "<b>Region: </b>", PS6$Location, "<br>",
  "<b>Students-Teacher ratio: </b>", as.character(PS6$STUTERATIO), "<br>",
  "<b>Total Students: </b>", as.character(PS6$TOTAL), "<br>")

From lines 110-124 created a Map showing all the Louisiana schools, and used the color pallette Set2 based of Location. Gree is city, Orange is rural, Purple is town, Blue is Suburban

library(leaflet)

palette <- colorFactor(palette = "Set2", domain = PS6$Location)

leaflet(PS6) %>%
  setView(lng = -91.5209, lat = 30.5191, zoom = 7) %>%
  addProviderTiles("Esri.WorldStreetMap") %>%
  addCircles(
    data = PS6,
    radius = PS6$TOTAL *6,
    color = ~palette(Location), 
    popup = Publicpop
  )

Assuming "longitude" and "latitude" are longitude and latitude, respectively

A. I used the str_detect function to filter the data for rows corresponding to Louisiana. Then, I applied another filter to exclude rows with NAvalues in the Grade_Level and Student_Teacher_Ratio columns. Finally, I used the mutate function to add a new column, Location, which categorizes each row as either Rural, City, Town, or Suburban. B. As for background information, I got all this information from a 2017-18 survey National Center for Education Statistics (NCES). NCES conducts these surveys to have a better understanding of school demographics and this data is used by the Department of Education to help determine how much funding each state/district should receive. C. I heavily struggled putting together my high charter graph. For whatever reason I wasn’t able to include the Suburban category, even when selecting a District that was fully in a suburban neighborhood. However, as previously stated, these schools were mostly alternative schools—so if I wanted to focus on regular schools this graph would still work. Another Area that became a challenge was uploading to Rpubs. My document seemingly became corrupted when I tried to upload my quarto document and would stay that way no matter how many times I updated the document. I eventually had to make a new document and then it worked.I wish the student teacher wasn’t aggregated so we could get a better understanding of the teacher student ratio by grade