── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("dslabs") # this is the package that include my data set
Warning: package 'dslabs' was built under R version 4.4.3
library(ggrepel) # package to add text labels
Warning: package 'ggrepel' was built under R version 4.4.3
data(package="dslabs")
This is the dataset
data("admissions")
removing na value from each column
remove_na <-!is.na(admissions$major) &!is.na(admissions$gender) &!is.na(admissions$admitted) &!is.na(admissions$applicants) # here I am removing na for each column
This is the code for the graph
plot1 <-ggplot(admissions, aes(x = applicants, y = admitted, color = gender)) +# Assign variables to x and y axisgeom_point(size =3, alpha =0.7) +# Define size and transparency of pointsgeom_smooth(method = lm, se =FALSE, lty =2) +# Use linear model for correlation linegeom_text_repel(aes(label = gender), nudge_x =0.005) +# Add text labels for genderlabs(title ="Applicants vs Admitted Students", # Title of the graphx ="Number of Applicant Students", # Label for x-axisy ="Number of Admitted Students", # Label for y-axiscolor ="Gender", # Color based on gendercaption ="source: Data Science Lab"# Caption for the source of the dataset ) +theme_minimal(base_size =14) +# Minimal theme with adjusted font sizetheme(plot.title =element_text(hjust =0.5, size =18, face ="bold", color ="darkblue")) # Title customization# Display the plotplot1
`geom_smooth()` using formula = 'y ~ x'
I use an admission dataset. It has 12 observations and 4 variables. After I remove all NA for each column, I graph scatterplot by using the number of applicant students in the x - axis and the number of admitted students in the y - axis to see the correlation between them for each gender. From the graph, I see that the graph of the women is down slop, and the graph of men is upslope. They also intersect each other. When I see the graph of women, the number of students admitted is high when the number of applicant students is low. When the number of applicant students increases, the number of admitted students decreases. From men graph, the number of admitted student increase when the number of applicant students increase. Generally, for women, the number of applicant and admitted students has negative correlation and for men, the number of applicants and admitted students has positive correlation.