This is a template file. The example included is not considered a good example to follow for Assignment 2. Remove this warning prior to submitting.
Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The objective of the original data is to visualize an increasing trend in the number of female population who are conferred college degrees. Its targetted audience are real estate company and land use experts who want to shift their business strategy based on the current market trends. The party who created the original data visualisation was using it to convince its audience that nowadays more female students are graduated with a college degree, hence they are more likely to earn higher income and obtain greater purchasing power in buying properties.
The visualisation chosen had the following three main issues:
Misrepresentation: From seeing the graph, it is hard to tell whether the percentage number is based on total number of students who study in college or those who are granted college degrees. Through looking into the original data from National Center for Education Statistics (NCES), it is confirmed to be the latter. In addition, the size of the figures is not proportional to the numbers as the two 58% figures are in different sizes. The original data suggests that it could be representing the absolute number of female population who received college degrees in that year.
Inaccuracy: The data shows a trend over time. The original data visualization was trying to convey this trend through the increasing size of the figures, however it does not accurately presents the data. The original data suggests that the percentage number of female students conferred college degrees hit a plateau after 2000, however, the increasing size of figures mislead the audience to believe the trend is still increasing.Further more, it is not clear how the data from the single year 1970 was obtained. The original data from NCES does not have a single entry for the year 1970, it only shows an aggregated data from the year 1970 to 1980. It is suspected that the creator of the data visualization used the statistics from 1969 to 1970.
Lack of accountability: The data of the year born could not be found in the original data source. It is unclear how these data were calculated and what factors were considered in the calculations. It would be biased to make assumptions that all college students studied at the same year are of same age. The year of when the students were born does not add value in conveying in the intended message to audience. Therefore, it is suggested to not include these data in visualization.
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
numbers <- data.frame(Year = c("1969", "1980",
"1990", "2000", "2010", "2012"),
Count = c(424009, 614915, 772514, 988063, 1421136, 1503139),
Perc = c(42.15, 49.68, 53.74, 57.52, 58.08, 57.99))
ylim.prim <- c(400000, 1600000)
ylim.sec <- c(20, 80)
p1 <- ggplot(numbers, aes(Year, Count))
p1 <- p1 + geom_bar(aes(group = 1, x= Year, y= Count),stat="identity", fill="orange3",colour="orange3") + geom_text(aes(group = 1, x = Year,y = Count, label = paste(Count)),nudge_y = -2, nudge_x = .05) + geom_line(aes(group = 1, x = Year, y = Perc * 20000), stat = "identity", colour = "turquoise3") + geom_point(aes(group = 1, y = Perc * 20000), colour = "turquoise3") +
geom_text(aes(group = 1, y = Perc * 20000, label = paste(Perc,"%",sep="")),nudge_y = -2, nudge_x = .05) +
labs(
title = "Number of Female Students completed Bachelor's and Master's degrees in \n selected years: 1969, 1980, 1990, 2000, 2010, 2012",
y = "Total number of female") + scale_y_continuous(limits = c(),sec.axis=sec_axis(~./20000,name="Female % in students completed the degrees")) + theme(plot.title=element_text(hjust=0.5))
Data Reference
The following plot fixes the main issues in the original.