This is a brief analysis of the facts about TED Talks.
TED [Technology, Entertainment, Design], with the slogan “ideas worth spreading”, is an organization that holds conferences and events that share the talks via online media. In the last decade, TED Talks grew rapidly and gained considerable influence. In many cases, TED has been recognized a major resource for gaining new thoughts and knowledge, especially among the young generations. Therefore, it may be both interesting and important for us to take a close look at the what have TED provided to its audiences:
-What topics have been offered to the audiences?
-What does the watching trend tell us?
-What are the features of the most popular TED Talks?
-What kind of talks attract the discussion?
-What tags and labels did the TED audiences prefer?
-What types event do TED Talks provide?
-Where are the most recent TED events held?
With the data retrieved from the TED Talks, the answers of this questions have been determined and visualized with different data processing technologies. This analysis is expected to demonstrate the data and graphs that provide the answers to questions above. More importantly, besides what have been provided, this analysis hopes to also show what has been absent: what else should be provided to these curious person that are still missing.
The datasets used in this analysis are from two sources.
TED_Main dataset: This dataset is built by Rounak Banik, which was lastly updated in September 2017(https://www.kaggle.com/rounakbanik/ted-talks). It contains the information about all talks including the number of views, the number of comments, descriptions, speakers, and titles up to September 21st, 2017, which were retrieved from the TED Talk official website.
TED_Events geographic: This dataset is collected by myself in 2018, which includes the date, year, location, and the types of TED event. All data are retrieved from the TED Talks official website (https://www.ted.com/tedx/events).
summary(TED_Main)
## ana_id title name num_speaker
## Min. : 1.0 Length:2534 Length:2534 Min. :1.000
## 1st Qu.: 634.2 Class :character Class :character 1st Qu.:1.000
## Median :1267.5 Mode :character Mode :character Median :1.000
## Mean :1267.5 Mean :1.028
## 3rd Qu.:1900.8 3rd Qu.:1.000
## Max. :2534.0 Max. :5.000
## languages event City Country
## Min. : 1.00 Length:2534 Length:2534 Length:2534
## 1st Qu.:23.00 Class :character Class :character Class :character
## Median :28.00 Mode :character Mode :character Mode :character
## Mean :27.34
## 3rd Qu.:33.00
## Max. :72.00
## Type year views
## Length:2534 Length:2534 Min. : 50443
## Class :character Class :character 1st Qu.: 754714
## Mode :character Mode :character Median : 1120485
## Mean : 1698418
## 3rd Qu.: 1697459
## Max. :47227110
## Views_Million comments speaker_occupation X__1
## Min. : 0.5044 Min. : 2.00 Length:2534 Mode:logical
## 1st Qu.: 7.5471 1st Qu.: 63.25 Class :character NA's:2534
## Median : 11.2049 Median : 118.00 Mode :character
## Mean : 16.9842 Mean : 191.87
## 3rd Qu.: 16.9746 3rd Qu.: 221.75
## Max. :472.2711 Max. :6404.00
## Label rate_count view_ranking comment_ranking
## Length:2534 Min. : 1.0 Min. : 1.0 Min. : 1.0
## Class :character 1st Qu.: 97.0 1st Qu.: 634.2 1st Qu.: 634.2
## Mode :character Median : 219.0 Median :1267.5 Median :1267.5
## Mean : 446.2 Mean :1267.5 Mean :1267.5
## 3rd Qu.: 448.8 3rd Qu.:1900.8 3rd Qu.:1900.8
## Max. :21444.0 Max. :2534.0 Max. :2534.0
summary(TED_Event)
## Date Event City
## Length:614 Length:614 Length:614
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Country Event Type
## Length:614 Length:614
## Class :character Class :character
## Mode :character Mode :character
A primary analysis has been done with R. The first and second graph demonstrated the available versions of languages and how it influences the enthusiasm of commenting. The third graph showed the numbers of views by year, which is color-coded by the most popular label voted by the audiences.
library(readxl)
library(ggplot2)
TED_Main <- read_excel("D:/UMich/Winter2018/SW 670/Data/Final Data/TED_Main.xlsx")
#Graph 1
ggplot(TED_Main,
aes(x=languages))+
geom_bar(fill="turquoise")+
theme_dark()+
labs(title="Numbers of Languages",
subtitle="offered by Ted Talk Video",
caption="Majority of Ted videos support 20-40 languages",
x="Numbers of Offered Languages",
y="Count of Videos")
#Graph2
ggplot(TED_Main,
aes(x=languages,y=comments,color=views))+
geom_point(size=2)+
geom_smooth(color="gold")+
theme_classic(base_size=12)+
labs(title="Offering More Language Leads to More Comments",
subtitle="Language x Comments",
x="Numbers of Offered Languages",
y="Numbers of Comments")
#Graph3
ggplot(TED_Main,
aes(x=year,y=Views_Million,color=Label))+
geom_point(size=3.5,alpha=0.5)+
theme(axis.text = element_text(size = 10, angle = 45))+
labs(title="Numbers of Views By Year",
subtitle="Color Coded by the Most Popular Label",
caption="Audiences love the funny ones!",
x="Year",
y="Numbers of View in Million",
color="Most Popular Label")
Further analysis has been done with Tableau. These graphs exhibited the relations between the label and other indicators, like the numbers of view, available languages, numbers and ranking of comments, and the top rated labels in the different country.
library(png)
library(grid)
img <-readPNG("C:/Users/sony-s/Desktop/Tableau.png")
grid.raster(img)
A QGIS map has been composed to show the most recent worldwide TED Talks. The events are color-coded by their types and shape-coded by the time, while the locations are color-coded by their population.
library(png)
library(grid)
img <-readPNG("C:/Users/sony-s/Desktop/TEDMAP.png")
grid.raster(img)
Part of the data have been scraped from the official TED Website and is available under the Creative Commons License.