
image source: https://www.brianleaf.com/images/sat-exam-score-highlighted.jpg
For my project 2, I decided I wanted to look at SAT test scores in New York. The dataset came from NYC.org and I found it through kaggle. It contains around 400 different observations which includes test scores, school, borough, latitude and longitude of the school, etc. The whole reason I chose this dataset is because I am an early college student and I plan to transfer once I graduate, and a big part of acceptance into certain schools is your SAT score so I thought it’d be interesting to do my project on this since it may be relevant for when I decide to transfer just as a comparison although it was the 2014-2015 school year.
library(tidyverse)
library(tidyr)
library(leaflet)
library(knitr)
library(plotly)
setwd("C:/Users/pickl/OneDrive")
scores <- read_csv("scores.csv")
scores_clean <- scores |>
group_by(Borough)|>
summarise(
`Average Math SAT` = mean(`Average Score (SAT Math)`, na.rm = TRUE),
`Average Reading SAT` = mean(`Average Score (SAT Reading)`, na.rm = TRUE),
`Average Writing SAT` = mean(`Average Score (SAT Writing)`, na.rm = TRUE),
.groups = "drop"
)
print(scores_clean)
## # A tibble: 5 × 4
## Borough `Average Math SAT` `Average Reading SAT` `Average Writing SAT`
## <chr> <dbl> <dbl> <dbl>
## 1 Bronx 404. 403. 396.
## 2 Brooklyn 416. 411. 403.
## 3 Manhattan 456. 445. 439.
## 4 Queens 462. 443. 440.
## 5 Staten Island 486. 478. 474.
scores_long <- scores_clean |>
pivot_longer(
cols = c(`Average Math SAT`, `Average Reading SAT`, `Average Writing SAT`),
names_to = "Test_Subject",
values_to = "Average_Score" # https://tidyr.tidyverse.org/reference/pivot_longer.html #link for values_to and names_to command
)
scores_plot <- scores_long |>
ggplot(aes(x = Borough, y = Average_Score, fill = Borough, text = paste("Borough: ", Borough, "<br>", "Score: ", round(Average_Score, 0)))) +
geom_col() +
facet_wrap(~ Test_Subject) +
labs(
title = "Average SAT Scores by Subject and Borough",
x = NULL,
y = "Average SAT Score",
caption = "Data Source: NYC.org (via Kaggle)"
) +
theme_minimal() +
scale_fill_brewer(palette = "Set2")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(scores_plot, tooltip = "text")
scores_clean2 <- scores |>
mutate(across(starts_with("Average Score (SAT"), ~as.numeric(.x))) |> #https://dplyr.tidyverse.org/reference/across.html #source for across function within the mutate
mutate( SAT_SCORE = `Average Score (SAT Math)` +
`Average Score (SAT Reading)` +
`Average Score (SAT Writing)` ) |>
filter(!is.na(SAT_SCORE)) |>
select(School_Name = 'School Name',Latitude, Longitude, SAT_SCORE)
print(head(scores_clean2))
## # A tibble: 6 × 4
## School_Name Latitude Longitude SAT_SCORE
## <chr> <dbl> <dbl> <dbl>
## 1 New Explorations into Science, Technology and Ma… 40.7 -74.0 1859
## 2 Essex Street Academy 40.7 -74.0 1193
## 3 Lower Manhattan Arts Academy 40.7 -74.0 1261
## 4 High School for Dual Language and Asian Studies 40.7 -74.0 1529
## 5 Henry Street School for International Studies 40.7 -74.0 1197
## 6 Bard High School Early College 40.7 -74.0 1914
sat_score_palette <- colorNumeric(palette = "Greens", domain = scores_clean2$SAT_SCORE,
na.color = "transparent")
leaflet_point_map <- leaflet(scores_clean2) |>
setView(lng = -74, lat = 40.5, zoom = 10) |>
addTiles() |>
addCircleMarkers(
lng = ~Longitude,
lat = ~Latitude,
radius = 6,
stroke = FALSE,
fillOpacity = 0.8,
color = ~sat_score_palette(SAT_SCORE),
fillColor = ~sat_score_palette(SAT_SCORE),
popup = ~paste0("<b>", School_Name, "</b><br>",
"Total SAT Score: ", round(SAT_SCORE, 0)) #round code source: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Round
) |>
addLegend("bottomright", pal = sat_score_palette, values = ~SAT_SCORE,
title = "Total Average SAT Score",
opacity = 1)
leaflet_point_map
For my project 2, I created 2 visualizations which contain a heatmap made with leaflet and a faceted bar chart. The first of the visualizations was the faceted bar chart where I used group_by, summarise, and mean to calculate the average SAT score for each of the 3 subjects (math, reading, writing). For the leaflet I was going for a heatmap similar to what I saw in class for last weeks assignment. For the heatmap, I filtered it using mutate and specifically ~as.numeric was important to ensure that the math parts of the code would function. I used mutate again to sum the scores up and get an overall average test score for each school. I used filter(!is.na in order to get rid of the NA’s since there were a few schools where this was prevalent. Also another piece of code I think is important is the round() function where it made all the test scores an even, whole number on the heat map. As for the insights, the faceted bar chart clearly shows a pattern of the Bronx being the lowest and Staten Island being the highest in each of the 3 subjects on the SAT. The heatmap however, somewhat shows me different, I think since Staten Island has so few schools, it may have been skewed in the faceted bar chart, I think that possibly Manhattan has the highest test scores. The heatmap reaffirms that the bronx was the lowest though. Something that could maybe enhance then insights or give some contect is maybe funding for each school, student to teacher ratios, etc.