DS Labs HW

Author

Tikki Dibonge

library("dslabs")
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
 [1] "make-admissions.R"                   
 [2] "make-brca.R"                         
 [3] "make-brexit_polls.R"                 
 [4] "make-calificaciones.R"               
 [5] "make-death_prob.R"                   
 [6] "make-divorce_margarine.R"            
 [7] "make-gapminder-rdas.R"               
 [8] "make-greenhouse_gases.R"             
 [9] "make-historic_co2.R"                 
[10] "make-mice_weights.R"                 
[11] "make-mnist_127.R"                    
[12] "make-mnist_27.R"                     
[13] "make-movielens.R"                    
[14] "make-murders-rda.R"                  
[15] "make-na_example-rda.R"               
[16] "make-nyc_regents_scores.R"           
[17] "make-olive.R"                        
[18] "make-outlier_example.R"              
[19] "make-polls_2008.R"                   
[20] "make-polls_us_election_2016.R"       
[21] "make-pr_death_counts.R"              
[22] "make-reported_heights-rda.R"         
[23] "make-research_funding_rates.R"       
[24] "make-results_us_election_2012.R"     
[25] "make-stars.R"                        
[26] "make-temp_carbon.R"                  
[27] "make-tissue-gene-expression.R"       
[28] "make-trump_tweets.R"                 
[29] "make-weekly_us_contagious_diseases.R"
[30] "save-gapminder-example-csv.R"        

Load dslabs package and list the data science labs.

library(ggthemes)
library(ggrepel)
Loading required package: ggplot2
library(extrafont)
Registering fonts with R
library(ggnewscale) # https://cran.r-project.org/web/packages/ggnewscale/index.html
data("admissions")

Load necessary packages and admissions data I’ll be using.

gender_colors <- c("men" = "#0072B2", "women" = "#F030AF")
major_colors <- c(
  "A" = "#E41A1C",
  "B" = "#377EB8",
  "C" = "#4DAF4A",
  "D" = "#984EA3",
  "E" = "#FF7F00",
  "F" = "#A65628"
) # State colors for gendered scatterplot points and the different majors A-F.

ggplot(admissions, aes(x = applicants, y = admitted)) +
  geom_point(aes(color = gender), size = 4, alpha = 0.8) +
  scale_color_manual(name = "Gender", values = gender_colors) + # Set scatterplot points to the gender colors and the size.
  ggnewscale::new_scale_color() + # Reset the color scale to majors
  
  geom_text_repel(
    aes(label = major, color = major),
    size = 2,
    fontface = "bold",
    box.padding = 0.5, # https://rdrr.io/cran/ggrepel/man/geom_text_repel.html
    point.padding = 0.05, 
    segment.size = 0.4
  ) + # Made labels show the major type and color, set major labels font size and bold, set the distance/line size/thickness of label line.
  scale_color_manual(name = "Major", values = major_colors) + # Applied the majors colors for the labels
  labs(
  title = "Admissions by Major between Men and Women",
  x = "Number of Applicants",
  y = "Number of Admitted Students"
) +
  theme_bw(base_size = 14, base_family = "Times") +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5, color = "#2C3E50"), # https://tidyverse.org/blog/2025/05/fonts-in-r/
    axis.title.x = element_text(color = "#34495E", face = "bold"),
    axis.title.y = element_text(color = "#34495E", face = "bold"),
    legend.position = "right"
  ) # Added graph title, and x and y axis titles. Set the theme, font, and font size. Set all titles to bold and positioned legend.

Essay:

The data set I chose to use was the Admissions data set that showed the amount of applicants and admitted students for majors A-F who were men and women. I created my scatter plot graph by first loading the necessary packages which were ggthemes, ggrepel, extrafont, and ggnewscale. Ggnewscale was a different package I installed because when I first started to create my graph, the ony legend that was created was the gender for men and women, but I also needed to show the different majors and their color to accurately present the difference between the men and women for a specific major. I then created values for those different major colors and gender colors in order to plot them. I created the scatter plot and set the points size and color to match the gender it was showing. I used the ggnewscale to reset the color scale to only the major colors for the points label and the new legend. I made the scatter plot point labels show the major type and associated color, set the major labels font size and made them bold, and then set the distance of the label from the point, and the label line size and thickness. Finally, I created the scatter plot title, the x and y axis titles, changed the theme and font, set the title/x and y axis titles to bold and a specific color, and positioned the legends to the right. The scatter plot now easily shows the difference in admitted vs. applicant students who were men and women for the different majors.