dslab HW

Author

J Amaya

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("~/Desktop/Desktop - Jackie’s MacBook Pro/DATA 110")
library("dslabs")
library(RColorBrewer)
library(extrafont)
Registering fonts with R
#data(package="dslabs")
data(nyc_regents_scores)
scores <- nyc_regents_scores |> # Adjusting the data to pivot longer so its easier to use
  pivot_longer(cols = 2:6, # grabs coulmns 2-6 and forms them into one column
               names_to = "subject", # names columns 2-6 into subject
               values_to = "count") # seperates the score counts into one column
scores <- scores |> # removing "_" from subjects so it is presented better in the  visualization
mutate(subject = str_replace_all(subject, "_", " "),
       subject = str_to_title(subject)) #make the columns uppercase so its presentable
ggplot(scores, aes(score, subject, fill = count)) +
  
  geom_rect(aes(xmin = 65, xmax = 102, ymin = -Inf, ymax = Inf), fill = "lightgreen") + # found this code by researching how to fill background color based on data. Source: https://www.geeksforgeeks.org/r-language/using-geomrect-for-time-series-shading-in-r/ 
  
  geom_tile(color = "white", width = 1.5, height = .95) +
  scale_x_continuous(expand=c(0,0)) +
  
  scale_fill_gradientn(colors = brewer.pal(9, "PuBu"), trans = "sqrt") +
  
  geom_vline(xintercept=65, col = "lightgreen", linewidth = 1.8) +# Added a line to show the passing grade of the exam
  theme_minimal(base_family = "Courier") + # changed font
  labs(
    y = "Subject",
    x = "Score" ,
    fill = "Count",
    title = "NYC Regent Test Scores by Subject",
       caption = "Source: dslab's nyc_regents_scores dataset
    Passing score is 65 (Represented by green outline)")
Warning: Removed 5 rows containing missing values or values outside the scale range
(`geom_tile()`).

For this assignment, I used the dataset “NYC_Regents_Scores” to create a heat map of the score results. I took inpsiration from the heat map example used for the disease dataset in this week’s tutorial. To start, I used pivot_longer to organize the subjects into one column and the counts into another column. When I first rendered my heatmap, I did not like the “_” in the names such as ”global_history” so I used str_replace_all and str_to_title to remove the underscores and make the first letter uppercase. I found the heatmap boring to look at so after researching the NYC regents exam’s passing score, I added a green outline for the passing scores so it can catch the viewer’s attention. I kept the xintercept line because it makes the green fill look like more of an outline compared to without it.

Code for the green background fill came from https://www.geeksforgeeks.org/r-language/using-geomrect-for-time-series-shading-in-r/