+— title: “Assignment 2” subtitle: “Deconstruct, Reconstruct Web Report” author: “Naresh Shankar (s3927631)” output: html_document urlcolor: blue —
Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The objective of the original visualisation was to show that women’s participation in athletics has not been very prominent. Higher opportunities for games have normally been provided to men over women. The target audience in this case is for the general public to understand the position women are in the current times.
The visualisation chosen had the following three main issues:
Reference
Gibbs, L (2020), “Data shows the staggering state of gender inequity in sports”. This is the second graph taken from the section, “The opportunity gap isn’t closing” - https://www.powerplays.news/p/data-shows-the-staggering-state-of?s=r
Slutsky, David J (2014), “The Effective Use of Graphs” viewed on 3rd April 2022 - https://ncbi.nlm.nih.gov/pmc/articles/PMC4078179/
The following code was used to fix the issues identified in the original. Please note, I had copied the dataset from the PDF into an excel file and then performed the below steps.
library(ggplot2)
library(tidyr)
options(scipen = 10000)
df_sports <- readxl::read_xlsx("~/My_R_Projects/Data_Visualisation/Assessment 2/NCAA_Dataset.xlsx",skip = 1)
df_sports$Year <- gsub("\\-.*","",df_sports$Year)
df_sports_long <- df_sports %>% gather("sex","TotalStudents", c("Men","Women"))
df_sports_plot <- ggplot(data = df_sports_long, aes(x = as.factor(Year), y = TotalStudents, group = sex))
df_sports_plot <- df_sports_plot + geom_line(aes(color=sex)) + scale_color_manual(values = c("blue","orange"))
df_sports_plot <- df_sports_plot + geom_point(size=1)
df_sports_plot <- df_sports_plot + scale_y_continuous(limits = c(0,300000)) + scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
df_sports_plot <- df_sports_plot + labs(title = "Student Participation in College Athletics", subtitle = "Gender-wise Athletics Participation", y = "No. of Students", x = "Year", caption = "Source: NCAA Sports Sponsorship and Participation Rates Report (Oct 2018)")
df_sports_plot <- df_sports_plot + geom_text(label=df_sports_long$TotalStudents, check_overlap = TRUE,nudge_x = 0.25,nudge_y = 0.25, hjust="inward",vjust = "bottom",size=3)
Data Reference
Other Coding References
Schork, Joachim, “Remove Characters Before or After Point in String in R (Example)” https://statisticsglobe.com/r-remove-characters-before-or-after-point-in-string
Stats Education, viewed on 3rd April 2022, “The gather() Function” to elongate my dataset. http://statseducation.com/Introduction-to-R/modules/tidy%20data/gather/
cmdline (2020), “ggplot2 3.3.0. Is Here : Two New Features You Must Know” subsection, “Dodge Overlapping Axis Text with guide_axis() in ggplot2 version 3.3.0” https://cmdlinetips.com/2020/03/ggplot2-2-3-0-is-here-two-new-features-you-must-know/
Wickham, Hadley, Navarro, Danielle and Pedersen, Thomas Lin (viewed 3rd April 2022) from the book, “ggplot2: Elegant Graphics for Data Analysis” Chapter “8 Annotations” subsection “8.2 Text labels” https://ggplot2-book.org/annotations.html
DataKwery (viewewd on 3rd April 2022), “Scientific notation in R” under the subsection “Universal”. The text is “I generally apply the more aggressive solution by telling R to avoid all scientific notation by setting ‘options(scipen=999)’ at the top of the script” https://www.datakwery.com/post/2020-07-11-scientific-notation-in-r/
The graph has been remodeled to reflect a trend in increase between years on the participation of men vs women. Upon first glance, one can significantly see the gap and easily comment on the difference. This will be how the public perceive this information and provide analysis. The following plot fixes the main issues in the original.
Data Reference