Data Visulisation - Assignment 2

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original

Source: Sources of Funding for Academic Research and Development in the Humanities and Other Selected Fields, Fiscal Year 2017.

Objective

The objective of the original data is to highlight the investment level made in the humanities field compared with other STEM fields. Specifically highlighting how much more the humanities field have to rely on its own academic funding. “While over two-thirds of humanities RD came from the institutions themselves, in every STEM field examined here, no more than 35% of RD was funded this way.” (Anon., 2019)

The visualisation chosen had the following three main issues:

Issue 1: Deceptive methods (Pie Chart)
The use of many pie chart makes the data hard to decipher for the reader. Both the size and area of segments in each of the chart is difficult to compare against each other. Since the goal is to compare the proportion of each funding source for humanities vs other STEM disciplines, it is a poor choice to use a pie chart to represent the information. A bar graph is a better choice to compare the proportional funding across the education fields.
Issue 2: Perception (Colours)
The unnecessary use of background colour, while pretty, distract from the user focus on the information. The colours used are very bright and contrasting colours. These are harder on the eye. Clearning it up and using most pastal colours that are easier.
Issue 3: Accuracy (No Data labels)
The goal is to clearly show that the investment levels in humanities from government is low. Without clear data labels in the graph, that comparison cannot be made to back up the claim. So clear data labels would add value here to make the comparison more data driven.

Reference
Anon., 2019. Humanties Indicators. [Online]
* Research and Development Expenditures at Colleges and Universities
HIV-10d: Sources of Funding for Academic Research and Development in the Humanities and Other Selected Fields, Fiscal Year 2017
Retrieved August 31, 2019, from Humanities Indicators
website: https://www.humanitiesindicators.org/content/indicatordoc.aspx?i=86

Code

The following code was used to fix the issues identified in the original.

funding <- read_excel("herd2017_dst_12.xlsx", skip = 4)
names(funding)[1:2] <- c("Field", "Total R&D")
names(funding)[5] <- "Academic Institution"
funding <- funding %>% filter(!is.na(`Total R&D`))
funding <- funding[,-2]

#getting the data from the spreadsheet to extract the information in the original graph
c <- c("Biological and biomedical sciences", "Health sciences", "Mathematics and statistics",
       "Physical sciences", "Psychology", "Social sciences", "Engineering", "Non-S&E", "Humanities")
a <- funding %>% 
  filter(Field %in% c)
a1 <- a %>% gather(key = "FundingSource", value = "funding", names(a)[2:7]) %>% 
  spread(key = "Field", value = "funding")

a2 <- a1 %>% mutate(`Mathematical Physical and Statistical Sciences` = `Mathematics and statistics`+`Physical sciences`,
                    `Behaviour and Social Sciences` = Psychology + `Social sciences`,
                    `Other Non-Science & Engineering` = `Non-S&E` - Humanities,
                    `Humanities(Excluding Communication*)` = Humanities)
a3 <- a2 %>% select(FundingSource, `Humanities(Excluding Communication*)`, `Behaviour and Social Sciences`, 
                    `Biological and biomedical sciences`, Engineering, `Health sciences`, 
                    `Mathematical Physical and Statistical Sciences`, `Other Non-Science & Engineering`)

a4 <- bind_cols(a3[,1], data.frame(prop.table(data.matrix(a3[-1]), 2)))
names(a4) <- c("FundingSource", "Humanities\n(Excluding \n Communication*)", "Behaviour and\n Social Sciences", 
               "Biological and\n biomedical sciences", "Engineering", "Health sciences", 
               "Mathematical,\n Physical and\n Statistical Sciences", "Other Non-Science\n & Engineering")

f1 <- a4 %>%  gather(key = "Field", value = "Proportion", names(a4)[-1])
f1$Field <- f1$Field %>% factor(levels = rev(c("Humanities\n(Excluding \n Communication*)", "Other Non-Science\n & Engineering", "Behaviour and\n Social Sciences", 
               "Biological and\n biomedical sciences", "Engineering", "Health sciences", 
               "Mathematical,\n Physical and\n Statistical Sciences")),
                                ordered = T)
f1$FundingSource <- factor(f1$FundingSource,
                           levels = rev(c("Academic Institution", "Federal government",
                                          "State and local government", "Business", "Nonprofit organizations",
                                          "All other sources")),
                           ordered = T)
#creating the graph
p <- ggplot(data = f1, aes(x = Field, y = Proportion, fill = FundingSource))
p1 <- p + geom_bar(stat = "identity", position = "stack") + 
  labs(y = "Funding Proportion",
       title = "Sources of Funding for Academic Research",
       subtitle = "Fiscal Year 2017", fill = "Funding\n Source") +
  theme_minimal()+
  theme(legend.position = "bottom", 
        axis.title.y = element_blank(), plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5)) +
  geom_label_repel(data=f1, aes(x = Field, y = Proportion, label = paste0(round(Proportion,2)*100,"%")), 
            position = position_stack(vjust = 0.5), size = 3, force = 2) +
  coord_flip() +
  scale_fill_manual(values = c('#D6DFE0','#FACBD0','#ED7890','#e6f5d0','#a1d76a','#E1C340'))

Data Reference

National Science Foundation, National Center for Science and Engineering Statistics Higher education R&D expenditures, by source of funds and R&D field: FY 2017. Retrieved August 31, 2019, from U.S. National Science Foundation website: https://ncsesdata.nsf.gov/herd/2017/html/herd2017_dst_12.html

Reconstruction

The following plot fixes the main issues in the original.

Data Visulisation - Assignment 2

Deconstruct, Reconstruct Web Report

Yinan Chris Zhang (s3489428)

Original

Code

Reconstruction