dslab home work

Author

Balemlay Azimeraw

packages

library(tidyverse) # this is the package
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("dslabs") # this is the package that include my data set
Warning: package 'dslabs' was built under R version 4.4.3
library(ggrepel) # package to  add text labels
Warning: package 'ggrepel' was built under R version 4.4.3
data(package="dslabs")

This is the dataset

data("admissions") 

removing na value from each column

remove_na <- !is.na(admissions$major) & !is.na(admissions$gender) & !is.na(admissions$admitted) & !is.na(admissions$applicants)  # here I am removing na for each column

This is the code for the graph

plot1 <- ggplot(admissions, aes(x = applicants, y = admitted, color = gender)) +  # Assign variables to x and y axis
  geom_point(size = 3, alpha = 0.7) + # Define size and transparency of points
  geom_smooth(method = lm, se = FALSE, lty = 2) + # Use linear model for correlation line
  geom_text_repel(aes(label = gender), nudge_x = 0.005) + # Add text labels for gender
  labs(
    title = "Applicants vs Admitted Students", # Title of the graph
    x = "Number of Applicant Students",  # Label for x-axis
    y = "Number of Admitted Students",  # Label for y-axis
    color = "Gender",       # Color based on gender
    caption = "source: Data Science Lab" # Caption for the source of the dataset                       
  ) +
  theme_minimal(base_size = 14) +  # Minimal theme with adjusted font size
  theme(
    plot.title = element_text(hjust = 0.5, size = 18, face = "bold", color = "darkblue")) # Title customization
  

# Display the plot
plot1
`geom_smooth()` using formula = 'y ~ x'

I use an admission dataset. It has 12 observations and 4 variables. After I remove all NA for each column, I graph scatterplot by using the number of applicant students in the x - axis and the number of admitted students in the y - axis to see the correlation between them for each gender. From the graph, I see that the graph of the women is down slop, and the graph of men is upslope. They also intersect each other. When I see the graph of women, the number of students admitted is high when the number of applicant students is low. When the number of applicant students increases, the number of admitted students decreases. From men graph, the number of admitted student increase when the number of applicant students increase. Generally, for women, the number of applicant and admitted students has negative correlation and for men, the number of applicants and admitted students has positive correlation.