library(ggplot2)
library(dslabs)
str(admissions)
## 'data.frame': 12 obs. of 4 variables:
## $ major : chr "A" "B" "C" "D" ...
## $ gender : chr "men" "men" "men" "men" ...
## $ admitted : num 62 63 37 33 28 6 82 68 34 35 ...
## $ applicants: num 825 560 325 417 191 373 108 25 593 375 ...
scatter1 <- ggplot(data = admissions,
aes(x = applicants,
y = admitted,
color = major)) +
geom_point(size = 5) +
theme_dark() +
geom_point(aes(shape = major), alpha = 0.8) +
scale_color_manual(values = c("lightskyblue", "plum1", "darkblue", "cyan", "lightgreen", "darkgreen")) +
labs(title = "Admission to USC Berkley based on Major",
x = "Number of Applicants",
y = "Percent Admitted to UC Berkley",
caption = "Data from DSLabs")
scatter1
I used the data set titled “admissions” which has information regarding students admitted to UC Berkley.Variables in the data set include major, gender, percentage of students admitted, and number of applicants. To create my scatterplot, I put admissions along the y axis, applicants along the x axis, and color coded based on major. This way you can see the difference in the percentage of admissions and the number of applicants amongst various majors. From the graph, we can conclude that the highest percentage of admitted students were see in major A. Even though that point has the lowest number of applicants. We can also see the lowest percentage of admissions were in in Major F, as both points are at the bottom of the graph. Lastly. since the is no clear trend going upwards or downwards we cannot make any conclusions on whether number of applicants have an effect on percent of admissions.