── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
score integrated_algebra global_history living_environment english
97 96 125 547 403 729
98 97 110 1229 446 1071
99 98 55 764 87 171
100 99 19 499 NA 638
101 100 NA NA NA NA
102 NA 148 65 95 86
us_history
97 972
98 3039
99 2074
100 1710
101 NA
102 83
The data set is structured as 6 variables, the first is the score in each test from 0 to 100, the other 5 are the quantity of applicants that had that score in the test, for algebra, history, environment, english and history. They give the freequency that score was acchieved in the test. The goal now own is to compare where these applicants score get the A, B, C, D and F scores.
Data manipulation
Another thing we can see, is that there are many NAs in it, that is beacouse no one had that score in the test. The only NA in the scores represent the candidates that did not take the test, witch I will put into the 0 score, becouse that is what happends when someone does not take a test.
datanona <-replace(nyc_regents_scores,is.na(nyc_regents_scores),0)datanona <- datanona %>%group_by(score) %>%summarise_each(funs(sum)) # group the columns you want to "leave alone"
Warning: `summarise_each()` was deprecated in dplyr 0.7.0.
ℹ Please use `across()` instead.
Warning: `funs()` was deprecated in dplyr 0.8.0.
ℹ Please use a list of either functions or lambdas:
# Simple named list: list(mean = mean, median = median)
# Auto named with `tibble::lst()`: tibble::lst(mean, median)
# Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
Now we must merge the data for gglopt to understand it all as measuring the same thing, making it easier to make a beautifull plot and to make the variables names more understandable to those who are not familiar with the data.
Now it is time to plot it all and see how it all works
plot <-ggplot(newdata, aes(Scores, value, colour = variable)) +geom_rect(aes(xmin =0, xmax =60, ymin =0, ymax =8500), alpha =0.3, fill ="gray", color =NA) +geom_rect(aes(xmin =60, xmax =70, ymin =0, ymax =8500), color =NA, fill ="white", alpha =0.3) +geom_rect(aes(xmin =70, xmax =80, ymin =0, ymax =8500), color =NA, fill ="slategray1", alpha =0.3) +geom_rect(aes(xmin =80, xmax =90, ymin =0, ymax =8500), color =NA, fill ="slateblue1", alpha =0.3) +geom_rect(aes(xmin =90, xmax =100, ymin =0, ymax =8500), color =NA, fill ="blue", alpha =0.1) +geom_line() +geom_point() +ggtitle("NYC-Regents Scores") +ylab("Score") +ylab("Frequency") +scale_color_brewer(palette ="Set1") +theme_minimal() +theme(legend.position ="top") +scale_x_continuous(limits=c(0,100)) +scale_y_continuous(limits=c(0,8500)) plot
A color change:
plot <-ggplot(newdata, aes(Scores, value, colour = variable)) +geom_rect(aes(xmin =0, xmax =60, ymin =0, ymax =8500), alpha =0.3, fill ="red3", color =NA) +geom_rect(aes(xmin =60, xmax =70, ymin =0, ymax =8500), color =NA, fill ="white", alpha =0.3) +geom_rect(aes(xmin =70, xmax =80, ymin =0, ymax =8500), color =NA, fill ="slategray1", alpha =0.3) +geom_rect(aes(xmin =80, xmax =90, ymin =0, ymax =8500), color =NA, fill ="slateblue1", alpha =0.3) +geom_rect(aes(xmin =90, xmax =100, ymin =0, ymax =8500), color =NA, fill ="blue", alpha =0.1) +geom_line() +geom_point() +ggtitle("NYC-Regents Scores") +ylab("Score") +ylab("Frequency") +scale_color_brewer(palette ="Dark2") +theme_minimal() +theme(legend.position ="top") +scale_x_continuous(limits=c(0,100)) +scale_y_continuous(limits=c(0,8500)) plot
We can see that many would not have passed, and would actually fail. ANd the great majority of Regents would be a little mediocres
future impruvements
The plots Are nice but unfortunally there is room for impruvement, the alpha was not working and there are many dots. A percentage change could be nice. And putting the Labbles for every Score “A, B, C, D, F” on the graph.