image source: https://www.brianleaf.com/images/sat-exam-score-highlighted.jpg

For my project 2, I decided I wanted to look at SAT test scores in New York. The dataset came from NYC.org and I found it through kaggle. It contains around 400 different observations which includes test scores, school, borough, latitude and longitude of the school, etc. The whole reason I chose this dataset is because I am an early college student and I plan to transfer once I graduate, and a big part of acceptance into certain schools is your SAT score so I thought it’d be interesting to do my project on this since it may be relevant for when I decide to transfer just as a comparison although it was the 2014-2015 school year.

Load necessary libraries and dataset

library(tidyverse)
library(tidyr)
library(leaflet)
library(knitr)
library(plotly)
setwd("C:/Users/pickl/OneDrive")
scores <- read_csv("scores.csv")

filtering the data to get the average score of each of the 3 tests by borough

scores_clean <- scores |>
  group_by(Borough)|>
summarise(
      `Average Math SAT` = mean(`Average Score (SAT Math)`, na.rm = TRUE), 
      `Average Reading SAT` = mean(`Average Score (SAT Reading)`, na.rm = TRUE),
      `Average Writing SAT` = mean(`Average Score (SAT Writing)`, na.rm = TRUE),
      .groups = "drop" 
  )
print(scores_clean)
## # A tibble: 5 × 4
##   Borough       `Average Math SAT` `Average Reading SAT` `Average Writing SAT`
##   <chr>                      <dbl>                 <dbl>                 <dbl>
## 1 Bronx                       404.                  403.                  396.
## 2 Brooklyn                    416.                  411.                  403.
## 3 Manhattan                   456.                  445.                  439.
## 4 Queens                      462.                  443.                  440.
## 5 Staten Island               486.                  478.                  474.

preparing data for faceted bar chart

scores_long <- scores_clean |>
  pivot_longer(
    cols = c(`Average Math SAT`, `Average Reading SAT`, `Average Writing SAT`), 
    names_to = "Test_Subject",              
    values_to = "Average_Score"  # https://tidyr.tidyverse.org/reference/pivot_longer.html  #link for values_to and names_to command         
  )

Making faceted bar chart

scores_plot <- scores_long |>
  ggplot(aes(x = Borough, y = Average_Score, fill = Borough, text = paste("Borough: ", Borough, "<br>", "Score: ", round(Average_Score, 0)))) + 
  geom_col() + 
  facet_wrap(~ Test_Subject) + 
  labs(
    title = "Average SAT Scores by Subject and Borough",
    x = NULL,
    y = "Average SAT Score",
    caption = "Data Source: NYC.org (via Kaggle)"
  ) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(scores_plot, tooltip = "text")

Filtering data to make heatmaps that include latitude and longitude

scores_clean2 <- scores |>
  mutate(across(starts_with("Average Score (SAT"), ~as.numeric(.x))) |> #https://dplyr.tidyverse.org/reference/across.html #source for across function within the mutate
  mutate( SAT_SCORE = `Average Score (SAT Math)` + 
                `Average Score (SAT Reading)` + 
                `Average Score (SAT Writing)` ) |>
  filter(!is.na(SAT_SCORE)) |>
  select(School_Name = 'School Name',Latitude, Longitude, SAT_SCORE)
print(head(scores_clean2))
## # A tibble: 6 × 4
##   School_Name                                       Latitude Longitude SAT_SCORE
##   <chr>                                                <dbl>     <dbl>     <dbl>
## 1 New Explorations into Science, Technology and Ma…     40.7     -74.0      1859
## 2 Essex Street Academy                                  40.7     -74.0      1193
## 3 Lower Manhattan Arts Academy                          40.7     -74.0      1261
## 4 High School for Dual Language and Asian Studies       40.7     -74.0      1529
## 5 Henry Street School for International Studies         40.7     -74.0      1197
## 6 Bard High School Early College                        40.7     -74.0      1914

Making heatmap using leaflet

sat_score_palette <- colorNumeric(palette = "Greens", domain = scores_clean2$SAT_SCORE, 
  na.color = "transparent")

leaflet_point_map <- leaflet(scores_clean2) |>  
  setView(lng = -74, lat = 40.5, zoom = 10) |>
  addTiles() |>  
  addCircleMarkers(
    lng = ~Longitude,    
    lat = ~Latitude,     
    radius = 6,          
    stroke = FALSE,      
    fillOpacity = 0.8,   
    color = ~sat_score_palette(SAT_SCORE), 
    fillColor = ~sat_score_palette(SAT_SCORE),
    popup = ~paste0("<b>", School_Name, "</b><br>",  
                    "Total SAT Score: ", round(SAT_SCORE, 0)) #round code source: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Round 
  ) |>  
  
  addLegend("bottomright", pal = sat_score_palette, values = ~SAT_SCORE,
    title = "Total Average SAT Score",
    opacity = 1)

leaflet_point_map

Essay Portion:

For my project 2, I created 2 visualizations which contain a heatmap made with leaflet and a faceted bar chart. The first of the visualizations was the faceted bar chart where I used group_by, summarise, and mean to calculate the average SAT score for each of the 3 subjects (math, reading, writing). For the leaflet I was going for a heatmap similar to what I saw in class for last weeks assignment. For the heatmap, I filtered it using mutate and specifically ~as.numeric was important to ensure that the math parts of the code would function. I used mutate again to sum the scores up and get an overall average test score for each school. I used filter(!is.na in order to get rid of the NA’s since there were a few schools where this was prevalent. Also another piece of code I think is important is the round() function where it made all the test scores an even, whole number on the heat map. As for the insights, the faceted bar chart clearly shows a pattern of the Bronx being the lowest and Staten Island being the highest in each of the 3 subjects on the SAT. The heatmap however, somewhat shows me different, I think since Staten Island has so few schools, it may have been skewed in the faceted bar chart, I think that possibly Manhattan has the highest test scores. The heatmap reaffirms that the bronx was the lowest though. Something that could maybe enhance then insights or give some contect is maybe funding for each school, student to teacher ratios, etc.