Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The objective of the original data visualisation is to show what languages are required for getting hired in Amazon. This is done by scraping Amazon job portal at https://amazon.jobs The targetted audience is university graduates, job-seekers or the general public.
The visualisation chosen had the following three main issues:
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(readr)
library(stringr)
library(dplyr)
amazon <- read_csv("C:/RMIT/Data Visulization S1 19/Assignment2/amazon_jobs_dataset.csv")
# What languages are required for jobs?
occurence <- c()
languages <- c('swift','matlab','mongodb','hadoop','cosmos', 'sql','spark', 'pig', 'python', 'java,', 'java.', 'java ', 'c[++]', 'php', 'javascript', 'objective c', 'ruby', 'perl', 'c ', 'c#', ' r,')
amazon$`PREFERRED QUALIFICATIONS` <- tolower(amazon$`PREFERRED QUALIFICATIONS`)
for(i in languages){
amazon$number <- str_count(amazon$`PREFERRED QUALIFICATIONS`, i)
occurence <- c(occurence, sum(amazon$number, na.rm = TRUE))
}
occurence
## [1] 26 9 16 164 1 408 128 20 378 392 1080 281 504 34
## [15] 348 79 237 188 630 139 7
# combine the occurences of 'java,', 'java.', 'java ' to 'jave'
languages_new <- data.frame(Languages = c('swift','matlab','mongodb','hadoop','cosmos', 'sql','spark', 'pig', 'python', 'java', 'c[++]', 'php', 'javascript', 'objective c', 'ruby', 'perl', 'c', 'c#', 'r'), Occurence = c(26, 9, 16, 164, 1,408, 128, 20, 378, 1753, 504, 34, 348, 79, 237, 188, 630, 139, 7))
# To reorder it according to Occurence
languages_new$Languages <- languages_new$Languages %>% factor(levels = languages_new$Languages[order(-languages_new$Occurence)])
p <- ggplot(data = languages_new, aes(x = Languages, y = Occurence))
p1 <- p + geom_bar(stat = "identity", fill="tan3") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + labs(title = "Languages in Amazon's job description \nJuly 2011 to March 2018", y = "Occurence", x = "Languages in job description") +
geom_text(aes(label=Occurence), vjust = -0.5, size = 3)
Data Reference
The following plot fixes the main issues in the original.