data("admissions") # The datset I will be using from dsLabs
Making the graph
admission_rate <- admissions[-c(8),] |>#removing desired row that was screwing the datamutate(Acceptance_Percent = admitted/applicants ) #This mutate function was used to make the acceptance percent for each major/gender
Adding two new libraries
library(ggrepel)
Warning: package 'ggrepel' was built under R version 4.3.3
library(ggthemes) # loading in the library
Warning: package 'ggthemes' was built under R version 4.3.3
My Final Graph
admission_rate |>ggplot(aes(x = applicants, y = Acceptance_Percent, label = gender)) +# Adding the x and y variables for the graphgeom_point(aes(color= major), size =3) +# making a point graph and a legend for majors in Berekeley based on colorgeom_smooth(method = lm, se =FALSE, color ="black", lty =2, linewidth =0.3) +# adding a dot line correlation linetheme_solarized()+# adding a different ggthemegeom_text_repel(nudge_x =0.005) +# add the ggrepel to label points as either women or men and making it easier to followylim(0,1) +# Adding a limiter so the graph max out at 1 xlab("Total of Applicants") +ylab("Acceptance Percentage in Decimals") +ggtitle("The Berkeley Acceptance Percentage of Different Majors") +#Adding titles to the graph and the x&yscale_color_discrete(name="Different Majors by Letters") # title the legend for the majors
`geom_smooth()` using formula = 'y ~ x'
Warning: The following aesthetics were dropped during statistical transformation: label.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Warning: Removed 19 rows containing missing values or values outside the scale range
(`geom_smooth()`).
Paragraph:
I decided to use the admissions dataset from Berkeley because I wondered if the acceptance rate in each major was different in men and women. I created the graph by making a new variable with mutate function called Acceptance_Percent by dividing admitted/applicants. To get a decimal which will help others read my graph. The decimals would show the percentage for example 0.1 = 10% acceptance rate. The making of the graph with ggplot was simple as all needed was the x which is applicants and the y with Acceptance_Percent. I used a new theme using ggthemes which looked very cool. I used ggrepel for the dots to be properly labeled with women or men. As some dots were packed together so ggrepel helped label them. I used geom_smooth to make a dotted correlation line as I liked to see a trend in the graphs I make. I used xlab,ylab, and ggtitle to properly give titles to the x& y and giving the whole graph a proper title. What I learned from my graph is that the more applicants there was for each major the harder it was to get admitted. One other thing is we can see how many women compared to men were applicants for each major and vise versa. However, the main question was also answered by the graph showing that there is some bias for men and women depending for each major in Berekeley because some majors had a higher acceptance percentage for men and vise versa for women. It was a very cool dataset to work with and I love all the new ways to customize my graphs for the future .