For my project I used a data set from the National Center for Education Statistics that listed information on student enrollment: how many children were in each grade for each school, districts: which school district each school belonged to, student teacher ratio, address: where the school was located and more. I decided to focus on Louisiana because I was curious in seeing the differences in rural and city enrollment. I chose Lousisiana specifically because I wanted a state that had a good ratio of both as to make sure my visualization would be interesting. Majority of my time on this project consisted of cleaning: filtering out what variables I wanted, converting formats and making sure to remove all the NAs (there was a lot)
From lines 11-15 I load in the appropriate libraries and call in my Public School dataset
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(highcharter)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 100729 Columns: 79
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (21): SURVYEAR, STABR, ST_LEAID, LEA_NAME, SCH_NAME, LSTREET1, LSTREET2,...
dbl (55): X, Y, OBJECTID, NCESSCH, LEAID, LZIP, LZIP4, STATUS, TOTFRL, FRELC...
lgl (3): LSTREET3, G13, AE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
From lines 19-23 I select 25 out the original 79 variables I plan on using in throughout my code and call it PS1. I then summon a table to compare how many observations each state/territory have, and I end up choosing to go with Louisiana (LA)
AK AL AR AS AZ BI CA CO CT DC DE FL GA
513 1478 1087 28 2414 174 10323 1900 1031 228 229 4375 2307
GU HI IA ID IL IN KS KY LA MA MD ME MI
41 294 1332 756 4245 1924 1319 1534 1390 1854 1420 599 3734
MN MO MS MT NC ND NE NH NJ NM NV NY OH
2558 2434 1060 822 2691 525 1095 495 2595 884 734 4808 3610
OK OR PA PR RI SC SD TN TX UT VA VI VT
1800 1249 2990 1121 317 1264 703 1782 9320 1066 2124 28 311
WA WI WV WY
2427 2283 735 369
From lines 27-34 I use the filter string detect to select only Louisiana, and then I filter again to remove any NAs from the grades and Student teacher Ratio. I excluded Kindergarten and Pre-K becuase most of their listings were just 0 and didn’t add anything. I also summon two more tables because I wanted to get a better understanding for the ranges of locale as well as student teacher ratio.
From lines 44-59 I use the function mutate to create an additional column called “Location” that assigns each row to either Rural, City, Town, or Suburban (Note: almost all suburban schools were listed as alternative schools or juvenile centers). I then Pivoted to long format so highcharter could read my data more effectively
Call:
lm(formula = STUTERATIO ~ Student_Count, data = PS5)
Residuals:
Min 1Q Median 3Q Max
-11.884 -3.544 -1.214 1.548 69.466
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.213596 0.414284 34.309 < 2e-16 ***
Student_Count 0.064013 0.008429 7.594 6.93e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.335 on 1030 degrees of freedom
Multiple R-squared: 0.05302, Adjusted R-squared: 0.0521
F-statistic: 57.67 on 1 and 1030 DF, p-value: 6.933e-14
From lines 63-67 I made table to see which cities i could potentially use for my graph. I choose three cities that all had the same amount of observations (12) and fell into one of the three categories: rural, City, Town.
table(PS5$LCITY)
Alexandria Atlanta Bastrop Baton Rouge Bell City
12 12 12 60 12
Bogalusa Bossier City Braithwaite Bridge City Calvin
12 12 12 12 12
Cameron Castor Centerville Converse Coushatta
12 12 12 12 12
Delhi Dodson Downsville Doyline Elizabeth
12 12 12 12 12
Elmer Epps Evans Farmerville Ferriday
12 12 12 12 12
Florien Forest Georgetown Gibsland Glenmora
12 12 12 12 24
Grambling Grand Chenier Grand Isle Grant Hackberry
12 12 12 12 12
Hammond Harahan Harrisonburg Holden Hornbeck
12 12 12 12 12
Houma Jefferson Jennings Jonesboro Jonesville
12 12 12 12 12
Kilbourne Lacassine Lafayette Lake Charles Leesville
12 12 12 12 12
Lena Logansport Mandeville Mansura Maurepas
12 24 12 12 12
Merryville Monroe Monterey Mt. Hermon Negreet
12 24 12 12 12
New Orleans Noble Pitkin Pleasant Hill Quitman
36 12 12 12 12
Reeves Saline Sicily Island Simpson Simsboro
12 12 12 12 12
Singer St. Gabriel Starks Sulphur Summerfield
12 12 12 12 24
Thibodaux
12
Form lines 72-91 I made line chart using high charter that displays the Rural, City, and Town student counts along grades 1st-12th. When you hover over each point it will list the grade count and then the Student-Teacher ratio. The data set already came with the student teacher ratio aggregated, so it doesn’t change.
From lines 110-124 created a Map showing all the Louisiana schools, and used the color pallette Set2 based of Location. Gree is city, Orange is rural, Purple is town, Blue is Suburban
Assuming "longitude" and "latitude" are longitude and latitude, respectively
A. I used the str_detect function to filter the data for rows corresponding to Louisiana. Then, I applied another filter to exclude rows with NAvalues in the Grade_Level and Student_Teacher_Ratio columns. Finally, I used the mutate function to add a new column, Location, which categorizes each row as either Rural, City, Town, or Suburban. B. As for background information, I got all this information from a 2017-18 survey National Center for Education Statistics (NCES). NCES conducts these surveys to have a better understanding of school demographics and this data is used by the Department of Education to help determine how much funding each state/district should receive. C. I heavily struggled putting together my high charter graph. For whatever reason I wasn’t able to include the Suburban category, even when selecting a District that was fully in a suburban neighborhood. However, as previously stated, these schools were mostly alternative schools—so if I wanted to focus on regular schools this graph would still work. Another Area that became a challenge was uploading to Rpubs. My document seemingly became corrupted when I tried to upload my quarto document and would stay that way no matter how many times I updated the document. I eventually had to make a new document and then it worked.I wish the student teacher wasn’t aggregated so we could get a better understanding of the teacher student ratio by grade