Below is the Northwestern University Admission graphic that you can find on Northwestern’s official website [https://www.adminplan.northwestern.edu/ir/data-book/]. They overlaid two plots on one another by using dual y-axes. However, I found that this graph is particuarly difficult to read and less visually pleasing to the readers.
As a result, I decided to recreate two separate plots that display the same information instead of trying to put it all in one single plot. I obtained the NU_admission_data.csv data from Northwestern University. I used Northwestern colors and fonts indicated on their branding tool website [https://www.northwestern.edu/brand/visual-identity/color-palettes/].
A major error of the official Northwestern graphic
Stacking the bars is a major error they make with the bars in their graphic. This makes it hard to show the total applicants per year, and the bar adds up over the applicant numbers. Each variable (e.g. applications, admitted students, matriculants) is a subset of another one, and stacking them does not help visualize these clearly. In order to make a easily readable graph, it would be better to manipulate data and use shades to represent the subsets of the variables in the bar graph.
Using Single dual y-axes plot vs. Two separate plot
I find that the two separate plot approach communicates the information better. First, the units of the y variables for application, admitted students matriculants are in count numbers while that for admission and yield rates are in percentage. Therefore combining them together, it is confusing to read the y axis. Furthermore, the context of the two kinds of variables as mentioned above is quite different; looking at admission numbers and the rates separately would be easier to read and also make the two plots more visibly pleasant.
Code
# bar chart databar_data <- admin_dat %>%mutate(cat_a = applications - admitted_students,cat_b = admitted_students - matriculants,cat_c = matriculants) %>%# or list 'cat_a, cat_b, cat_c'select(year, contains("cat_")) %>%# or list 'c("cat_a", "cat_b", "cat_c")'pivot_longer(cols =-year,names_to ="category")# bar chart labelsbar_labels <- admin_dat %>%select(-contains("rate")) %>%pivot_longer(cols =-year,names_to ="category") %>%mutate(col_labels =prettyNum(value, big.mar =","))# line chart dataline_data <- admin_dat %>%select(year, admission_rate, yield_rate) %>%pivot_longer(cols =-year,names_to ="category")# line chart labelsline_labels <- line_data %>%mutate(label =str_c(value, "%", sep =""))
The box plot shows that for most quarters, the median shots Curry made were around 20ft away from the hoop. Comparing the medians, he tended to miss the shots when he was nearer to 25ft and tended to make successful shots at a little closer than 20ft away from the hoop in general. Curry’s performance was moderately consistent over the first three quarters, where the median, interquartile ranges, and range for made and missed shots were similar. In quarter 3, he attempted some shots father away from the hoop, but he missed them. In quarter 4, the median of the distance from the hoop for his made shots were lower, being around 15ft. During overtime, shots he made were very close to the hoop and he makes less three-point shots.
The density plot shows us that Curry attempted the most shots at around 25ft distance, and also at 2ft which is nearer to the hoop. At around 25ft distance, he had a little more misses than shots he makes. On the other hand, when he was nearer to the hoop at between 0 to 3ft distance, he was able to make almost as twice many shots than he missed.
According to the heat map, Curry attempted the most shots right by the hoop or around the three-point line especially nearer to the center of the court. In these areas he made more than 15 shot attempts. He also attempted some free-throw shots, as its area is colored more orange. He made less attempts in other areas. An interesting shot is near the upper right corner of the graphic; this seems like the missed shot he attempted in Q1, which we saw as an outlying value from over 40ft in the previous box plot.
3. 2020 Presidential Election - Biden vs. Trump
This data visualization uses the ga_election_data.csv dataset in conjunction with mapping data ga_map.rda. The ga_election_data.csv dataset contains the state of Georgia’s county level results for the 2020 US presidential election.
Here is a short description of the variables it contains:
County - name of county in Georgia
Candidate - name of candidate on the ballot,
Election Day Votes - number of votes cast on election day for a candidate within a county
Absentee by Mail Votes - number of votes cast absentee by mail, pre-election day, for a candidate within a county
Advanced Voting Votes - number of votes cast in-person, pre-election day, for a candidate within a county
Provisional Votes - number of votes cast on election day for a candidate within a county needing voter eligibility verification
Total Votes - total number of votes for a candidate within a county
I have also included the map data for Georgia (ga_map.rda) which was retrieved using tigris::counties().
Holding the 2020 US Presidential election during the COVID-19 pandemic was a massive logistical undertaking. Voter engagement was extremely high which produced a historical high voting rate. Voting operations, headed by states, ran very monthly and encountered few COVID-19 related issues. The state of Georgia did a particularly good job at this by encouraging their residents to use early voting. About 75% of the vote in a typical county voted early! Ignoring county boundaries, about 4 in every 5 voters, 80%, in Georgia voted early.
While it is clear that early voting was the preferred option for Georgia voters, I want to investigate whether or not one candidate/party utilized early voting more than the other – I am focusing on the two major candidates.
Map
Code
# dataga_graph <- ga_dat %>%# make new column for proportion of votes on pre-election day / early votingmutate(prop_pre_eday = (absentee_by_mail_votes + advanced_voting_votes) / total_votes) %>%# select all columns except those that include "_vote"select(-contains("_vote")) # biden map databiden_map_data <- ga_map %>%# return all rows for candidate Biden and all columns from x and yleft_join( ga_graph %>%filter(candidate =="Joseph R. Biden"),by =c("name"="county") )# trump map datatrump_map_data <- ga_map %>%# return all rows for candidate Trump and all columns from x and yleft_join( ga_graph %>%filter(candidate =="Donald J. Trump"),by =c("name"="county") )# biden plotbiden <-ggplot(biden_map_data) +geom_sf(aes(fill = prop_pre_eday), show.legend =FALSE) + ggthemes::theme_map() +theme(plot.subtitle =element_text(size =12),plot.title =element_text(face ="bold", size =14)) +labs(title ="Joseph R. Biden",subtitle ="Democratic Nominee") +scale_fill_gradient2(limits =c(0.5, 1),low ="#1AFF1A", high ="#5D3A9B",mid ="white", midpoint =0.75)# trump plottrump <-ggplot(trump_map_data) +geom_sf(aes(fill = prop_pre_eday)) + ggthemes::theme_map() +theme(plot.subtitle =element_text(size =12),plot.title =element_text(face ="bold", size =14),legend.position =c(0.75, 1),legend.justification =c(0, 1),legend.text =element_text(size =10)) +labs(title ="Donald J. Trump",subtitle ="Republican Nominee") +scale_fill_gradient2(limits =c(0.5, 1),low ="#1AFF1A", high ="#5D3A9B",mid ="white", midpoint =0.75,breaks =seq(0.5, 1.0, 0.25),labels =c("50%", "75%", "100%"),name =NULL)# final plot(biden + trump) +plot_annotation(title ="Percentage of votes from early voting",caption ="Georgia: 2020 US Presidential Election Results",theme =theme(plot.title =element_text(face ="bold", size =24),plot.caption =element_text(size =10)) )
Summary
This graphic compares the percentage of votes from early voting in Georgia for Joseph R. Biden and Donald J. Trump. The colors represent the early voting rates in th specific regions of Georgia. At first sight, we notice that there is more green in Trump’s map than Biden’s, which shows more white and purple. This means that more early voting percents are near 50% for Trump, and most votes are between 75% and 100% for Biden. In general, this map concisely shows us that the percentage of votes from early voting for Biden is higher than that for Trump in Georgia during the 2020 US Presidnetial Election.