Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
Through the treemap, world cloud, and string analysis, the author wish to let the target audience know about the trend or topic of TED talk in these years. The beginning user of TED talk or the researcher might get interested in this discovery, so they are the target audience.
Main issues:
Reference
A open dataset from Kaggle (CC BY-NC-SA 4.0) containing information about all audio-video recordings of TED Talks uploaded to the official TED.com website until September 21st, 2017. website: https://www.kaggle.com/gsdeepakkumar/lets-talk-about-ted-talks/log
The following code was used to fix the issues identified in the original.
library(ggplot2) # Data visualisation
library(dplyr) # data manipulation
library(stringr) # String manipulation
library(colourpicker)
ted=read.csv("/Users/qmoa_liu/downloads/assignment2template1950/ted_main.csv",header=TRUE,stringsAsFactors = FALSE)
transcript=read.csv("/Users/qmoa_liu/downloads/assignment2template1950/transcripts.csv",header=TRUE,stringsAsFactors = FALSE)
# objective1
test = ted %>% select(description,name,duration,views) %>% mutate(deslength=str_length(description))
summary(test$deslength)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52.0 236.0 296.0 313.7 379.0 769.0
test %>% arrange(desc(deslength)) %>% head(5) %>% ggplot(aes(reorder(name,deslength),deslength,fill=name))+geom_bar(stat="identity")+theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())+labs(x="Name",y="Description length",title="Top 5 talks with most description length")+coord_cartesian(ylim=c(650,775))
# objective2
CPCOLS <- c("#1f78b4", "#33a02c", "#FFD700", "#EE82EE", "#6B6B6B", "#8B5A00", "#8B2252", "#7FFF00", "#fb9a99", "#fdbf6f", "#cab2d6", "#ffff99", "#A020F0", "#00F5FF", "#e31a1c", "#ff7f00", "#6a3d9a", "#b15928", "#a6cee3", "#b2df8a")
test=ted %>% group_by(event) %>% tally() %>% arrange(desc(n))
ggplot(head(test,20),aes(factor(event,levels=event),n,fill=event))+geom_bar(stat="identity")+theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())+scale_fill_manual(values = CPCOLS)+labs(x="Event",y="Number of Talks",title="Top 20 Events by number of talks")
Data Reference
The following plot fixes the main issues in the original.