This is an Rmarkdown based on the notebook written by Mikael Huss, avaialble here. The orginal python notebook can be found here


The world of TED

Founded in 1984 by Richard Saulman as a non profit organisation that aimed at bringing experts from the fields of Technology, Entertainment and Design together, TED Conferences have gone on to become the Mecca of ideas from virtually all walks of life. As of 2015, TED and its sister TEDx chapters have published more than 2000 talks for free consumption by the masses and its speaker list boasts of the likes of Al Gore, Jimmy Wales, Shahrukh Khan and Bill Gates.

Ted, which operates under the slogan ‘Ideas worth spreading’ has managed to achieve an incredible feat of bringing world renowned experts from various walks of life and study and giving them a platform to distill years of their work and research into talks of 18 minutes in length. What’s even more incredible is that their invaluable insights is available on the Internet for free.

Since the time I begin watching TED Talks in high school, they have never ceased to amaze me. I have learned an incredible amount, about fields I was completely alien to, in the form of poignant stories, breathtaking visuals and subtle humor. So in this notebook, I wanted to attempt at finding insights about the world of TED, its speakers and its viewers and try to answer a few questions that I had always had in the back of my mind.

The data has been obtained by running a custom web scraper on the official TED.com website. The data is shared under the Creative Commons License (just like the TED Talks) and hosted on Kaggle. You can download it here: https://www.kaggle.com/rounakbanik/ted-talks


The Main TED Dataset

The main dataset contains metadata about every TED Talk hosted on the TED.com website until September 21, 2017. Let me give you a brief walkthrough of the kind of data available so as to give you an idea of what are the possibilities with this dataset.

library(tidyverse)
df <- read_csv("data/ted_main.csv")
names(df)
 [1] "comments"           "description"        "duration"           "event"              "film_date"         
 [6] "languages"          "main_speaker"       "name"               "num_speaker"        "published_date"    
[11] "ratings"            "related_talks"      "speaker_occupation" "tags"               "title"             
[16] "url"                "views"             

Features Available

I’m just going to reorder the columns in the order I’ve listed the features for my convenience (and OCD), and convert the Unix timestamps into a human readable format.

library(anytime)
df <- df %>%
  select(name, title, description, main_speaker, speaker_occupation,
         num_speaker, duration, event, film_date, published_date, comments, 
         tags, languages, ratings, related_talks, url, views) %>%
  mutate_at(funs(anydate(., tz = 'UTC')), .vars = c('film_date', 'published_date'))
glimpse(df)
Observations: 2,550
Variables: 17
$ name               <chr> "Ken Robinson: Do schools kill creativity?", "Al Gore: Averting the climate cr...
$ title              <chr> "Do schools kill creativity?", "Averting the climate crisis", "Simplicity sell...
$ description        <chr> "Sir Ken Robinson makes an entertaining and profoundly moving case for creatin...
$ main_speaker       <chr> "Ken Robinson", "Al Gore", "David Pogue", "Majora Carter", "Hans Rosling", "To...
$ speaker_occupation <chr> "Author/educator", "Climate advocate", "Technology columnist", "Activist for e...
$ num_speaker        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ duration           <int> 1164, 977, 1286, 1116, 1190, 1305, 992, 1198, 1485, 1262, 1414, 1538, 1550, 52...
$ event              <chr> "TED2006", "TED2006", "TED2006", "TED2006", "TED2006", "TED2006", "TED2006", "...
$ film_date          <date> 2006-02-25, 2006-02-25, 2006-02-24, 2006-02-26, 2006-02-22, 2006-02-02, 2006-...
$ published_date     <date> 2006-06-27, 2006-06-27, 2006-06-27, 2006-06-27, 2006-06-27, 2006-06-27, 2006-...
$ comments           <int> 4553, 265, 124, 200, 593, 672, 919, 46, 852, 900, 79, 55, 71, 242, 99, 325, 30...
$ tags               <chr> "['children', 'creativity', 'culture', 'dance', 'education', 'parenting', 'tea...
$ languages          <int> 60, 43, 26, 35, 48, 36, 31, 19, 32, 31, 27, 20, 24, 27, 25, 31, 32, 27, 22, 32...
$ ratings            <chr> "[{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', '...
$ related_talks      <chr> "[{'id': 865, 'hero': 'https://pe.tedcdn.com/images/ted/172559_800x600.jpg', '...
$ url                <chr> "https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity\n", "http...
$ views              <int> 47227110, 3200520, 1636292, 1697550, 12005869, 20685401, 3769987, 967741, 2567...

We also have another dataset which contains the transcript of every talk but we will get to that later. For now, let us begin with the analysis of TED Talks!
We have over 2550 talks at our disposal. These represent all the talks that have ever been posted on the TED Platform until September 21, 2017 and has talks filmed in the period between 1994 and 2017. It has been over two glorious decades of TED.


Most Viewed Talks of All Time

For starters, let us perform some easy analysis. I want to know what the 15 most viewed TED talks of all time are. The number of views gives us a good idea of the popularity of the TED Talk.

pop_talks <- df %>%
  select(title, main_speaker, views, film_date) %>% 
  arrange(desc(views)) %>% 
  head(15)
pop_talks
# A tibble: 15 x 4
                                                                 title      main_speaker    views  film_date
                                                                 <chr>             <chr>    <int>     <date>
 1                                         Do schools kill creativity?      Ken Robinson 47227110 2006-02-25
 2                            Your body language may shape who you are         Amy Cuddy 43155405 2012-06-26
 3                                    How great leaders inspire action       Simon Sinek 34309432 2009-09-17
 4                                          The power of vulnerability       Brené Brown 31168150 2010-06-06
 5                              10 things you didn't know about orgasm        Mary Roach 22270883 2009-02-06
 6                          How to speak so that people want to listen   Julian Treasure 21594632 2013-06-10
 7                                                My stroke of insight Jill Bolte Taylor 21190883 2008-02-27
 8                                                Why we do what we do      Tony Robbins 20685401 2006-02-02
 9                   This is what happens when you reply to spam email      James Veitch 20475972 2015-12-08
10                   Looks aren't everything. Believe me, I'm a model.   Cameron Russell 19787465 2012-10-27
11                                            The puzzle of motivation          Dan Pink 18830983 2009-07-24
12                                             The power of introverts        Susan Cain 17629275 2012-02-28
13                                                  How to spot a liar      Pamela Meyer 16861578 2011-07-13
14 What makes a good life? Lessons from the longest study on happiness  Robert Waldinger 16601927 2015-11-14
15                                     The happy secret to better work       Shawn Achor 16209727 2011-05-11

Observations

Let us make a bar chart to visualise these 15 talks in terms of the number of views they garnered.

ggplot(pop_talks, aes(x = reorder(main_speaker, views), y = views/100000, 
                      fill = main_speaker)) +
  geom_bar(stat = 'identity') +
  guides(fill = FALSE) + labs(x = "", y = "Views (x 100,000)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Finally, in this section, let us investigate the summary statistics and the distribution of the views garnered on various TED Talks.

ggplot(df, aes(views)) +
  geom_histogram(aes(y = ..density..)) +
  geom_line(stat = "density") + xlim(c(0, 0.4e7))

summary(df$views)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   50443   755793  1124524  1698297  1700760 47227110 

The average number of views on TED Talks in 1.6 million, and the median number of views is 1.12 million. This suggests a very high average level of popularity of TED Talks. We also notice that the majority of talks have views less than 4 million. We will consider this as the cutoff point when constructing box plots in the later sections.

Comments

Although the TED website gives us access to all the comments posted publicly, this dataset only gives us the number of comments. We will therefore have to restrict our analysis to this feature only. You could try performing textual analysis by scraping the website for comments.

summary(df$comments)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    2.0    63.0   118.0   191.6   221.8  6404.0 

Observations

ggplot(df, aes(comments)) +
  geom_histogram(aes(y = ..density..)) +
  geom_line(stat = "density") + xlim(c(0, 500))

From the plot above, we can see that the bulk of the talks have fewer than 500 comments. This clearly suggests that the mean obtained above has been heavily influenced by outliers. This is possible because the number of samples is only 2550 talks.

Another question that I am interested in is if the number of views is correlated with the number of comments. We should think that this is the case as more popular videos tend to have more comments. Let us find out.

library(ggExtra)
ggMarginal(
  ggplot(df, aes(x = views, y = comments)) + geom_point(),
  type = "histogram")

cor(df[, c("views", "comments")])
             views  comments
views    1.0000000 0.5309387
comments 0.5309387 1.0000000

As the scatterplot and the correlation matrix show, the Pearson coefficient is slightly more than 0.5. This suggests a medium to strong correlation between the two quantities. This result was pretty expected as mentioned above. Let us now check the number of views and comments on the 10 most commented TED Talks of all time.

df %>%
  select(title, main_speaker, views, comments) %>% 
  arrange(desc(comments)) %>% 
  head(10)
# A tibble: 10 x 4
                                      title      main_speaker    views comments
                                      <chr>             <chr>    <int>    <int>
 1                         Militant atheism   Richard Dawkins  4374792     6404
 2              Do schools kill creativity?      Ken Robinson 47227110     4553
 3       Science can answer moral questions        Sam Harris  3433437     3356
 4                     My stroke of insight Jill Bolte Taylor 21190883     2877
 5        How do you explain consciousness?    David Chalmers  2162764     2673
 6             Taking imagination seriously    Janet Echelman  1832930     2492
 7                     On reading the Koran   Lesley Hazleton  1847256     2374
 8 Your body language may shape who you are         Amy Cuddy 43155405     2290
 9             The danger of science denial   Michael Specter  1838628     2272
10         How great leaders inspire action       Simon Sinek 34309432     1930

As can be seen above, Richard Dawkins’ talk on Militant Atheism’ generated the greatest amount of discussion and opinions despite having significantly lesser views than Ken Robinson’s talk, which is second in the list. This raises some interesting questions.

Which talks tend to attract the largest amount of discussion?

To answer this question, we will define a new feature discussion quotient which is simply the ratio of the number of comments to the number of views. We will then check which talks have the largest discussion quotient.

df <- mutate(df, dis_quo = comments/views)
df %>% 
  select(title, main_speaker, views, comments, dis_quo, film_date) %>% 
  arrange(desc(dis_quo)) %>% 
  head(10)
# A tibble: 10 x 6
                                 title          main_speaker   views comments     dis_quo  film_date
                                 <chr>                 <chr>   <int>    <int>       <dbl>     <date>
 1      The case for same-sex marriage       Diane J. Savino  292395      649 0.002219600 2009-12-02
 2              E-voting without fraud         David Bismark  543551      834 0.001534355 2010-07-14
 3                    Militant atheism       Richard Dawkins 4374792     6404 0.001463841 2002-02-02
 4 Inside a school for suicide bombers Sharmeen Obaid-Chinoy 1057238     1502 0.001420683 2010-02-10
 5        Taking imagination seriously        Janet Echelman 1832930     2492 0.001359572 2011-03-03
 6                On reading the Koran       Lesley Hazleton 1847256     2374 0.001285149 2010-10-10
 7        Curating humanity's heritage     Elizabeth Lindsey  439180      555 0.001263719 2010-12-08
 8   How do you explain consciousness?        David Chalmers 2162764     2673 0.001235918 2014-03-18
 9        The danger of science denial       Michael Specter 1838628     2272 0.001235704 2010-02-11
10           Dance to change the world      Mallika Sarabhai  481834      595 0.001234865 2009-11-04

This analysis has actually raised extremely interesting insights. Half of the talks in the top 10 are on the lines of Faith and Religion. I suspect science and religion is still a very hotly debated topic even in the 21st century. We shall come back to this hypothesis in a later section.

The most discusses talk, though, is The Case for Same Sex Marriage (which has religious undertones). This is not that surprising considering the amount of debate the topic caused back in 2009 (the time the talk was filmed).


Analysing TED Talks by the month and the year

TED (especially TEDx) Talks tend to occur all throughout the year. Is there a hot month as far as TED is concerned? In other words, how are the talks distributed throughout the months since its inception? Let us find out.

library(lubridate)
df$month <- month(df$film_date, label = TRUE)
ggplot(df, aes(x = month, fill = month)) + 
  geom_bar() + guides(fill = FALSE)

February is clearly the most popular month for TED Conferences whereas August and January are the least popular. February’s popularity is largely due to the fact that the official TED Conferences are held in February. Let us check the distribution for TEDx talks only.

df_x <- df %>%
  filter(grepl("TEDx", event))

df_x %>%
  group_by(month) %>%
  count() %>%
  ggplot(aes(x = month, y = n, fill = month)) + 
  geom_bar(stat='identity') + guides(fill=FALSE)

As far as TEDx talks are concerned, November is the most popular month. However, we cannot take this result at face value as very few of the TEDx talks are actually uploaded to the TED website and therefore, it is entirely possible that the sample in our dataset is not at all representative of all TEDx talks. A slightly more accurate statement would be that the most popular TEDx talks take place the most in October and November.

The next question I’m interested in is the most popular days for conducting TED and TEDx conferences. The tools applied are very sensible to the procedure applied for months.

df %>%
  mutate(day = wday(film_date, label = T, week_start = 1)) %>%
  group_by(day) %>%
  count() %>%
  ggplot(aes(x = day, y = n, fill = day)) + 
  geom_bar(stat = 'identity') + guides(fill = FALSE)

The distribution of days is almost a bell curve with Wednesday and Thursday being the most popular days and Sunday being the least popular. This is pretty interesting because I was of the opinion that most TED Conferences would happen sometime in the weekend.

Let us now visualize the number of TED talks through the years and check if our hunch that they have grown significantly is indeed true.

df$year <- year(df$film_date)
df %>%
  group_by(year) %>%
  count() %>%
  ggplot(aes(x = year, y = n)) + 
  geom_line() + geom_point() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Observations

Finally, to put it all together, let us construct a heat map that shows us the number of talks by month and year. This will give us a good summary of the distribution of talks.

df %>%
  group_by(month, year) %>% 
  count() %>%
  ggplot(aes(x = year, y = month)) +
  geom_tile(aes(fill = n)) +
  geom_text(aes(label = n), size = 2.5,
            position = position_jitter(height = .25)) +
  scale_fill_gradient(low = "white", high = "red") +
  guides(fill = FALSE) + labs(x = "Year", y = "Month")


TED Speakers

In this section, we will try and gain insight about all the amazing speakers who have managed to inspire millions of people through their talks on the TED Platform. The first question we shall ask in this section is who are the most popular TED Speakers. That is, which speakers have given the most number of TED Talks.

group_by(df, main_speaker) %>% 
  count() %>%
  arrange(desc(n))
# A tibble: 2,156 x 2
# Groups:   main_speaker [2,156]
           main_speaker     n
                  <chr> <int>
 1         Hans Rosling     9
 2        Juan Enriquez     7
 3        Marco Tempest     6
 4                Rives     6
 5           Bill Gates     5
 6          Clay Shirky     5
 7           Dan Ariely     5
 8 Jacqueline Novogratz     5
 9      Julian Treasure     5
10  Nicholas Negroponte     5
# ... with 2,146 more rows

Hans Rosling, the Swedish Health Professor is clearly the most popular TED Speaker, with more than 9 appearances on the TED Forum. Juan Enriquez comes a close second with 7 appearances. Rives and Marco Tempest have graced the TED platform 6 times.

Which occupation should you choose if you want to become a TED Speaker? Let us have a look what kind of people TED is most interested in inviting to its events.

occupation_df <- group_by(df, speaker_occupation) %>% 
  count() %>%
  arrange(desc(n))
occupation_df
# A tibble: 1,449 x 2
# Groups:   speaker_occupation [1,449]
   speaker_occupation     n
                <chr> <int>
 1             Writer    45
 2             Artist    34
 3           Designer    34
 4         Journalist    33
 5       Entrepreneur    31
 6          Architect    30
 7           Inventor    27
 8       Psychologist    26
 9       Photographer    25
10          Filmmaker    21
# ... with 1,439 more rows
ggplot(head(occupation_df, 10), aes(x = reorder(speaker_occupation, n), 
                                   y = n, fill = speaker_occupation)) + 
    geom_bar(stat ="identity") + guides(fill = FALSE) +
  labs(x = "") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Observations

Do some professions tend to attract a larger number of viewers? To answer this question let us visualise the relationship between the top 10 most popular professions and the views they garnered in the form of a box plot.

df %>%
  filter(speaker_occupation %in% head(occupation_df$speaker_occupation, 10)) %>%
  ggplot(aes(x = speaker_occupation, y = views, fill = speaker_occupation)) +
  geom_boxplot() +
  labs(x = "") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  guides(fill=FALSE)

On average, out of the top 10 most popular professions, Psychologists tend to garner the most views. Writers have the greatest range of views between the first and the third quartile.

Finally, let us check the number of talks which have had more than one speaker.

table(df$num_speaker)

   1    2    3    4    5 
2492   49    5    3    1 

Almost every talk has just one speaker. There are close to 50 talks where two people shared the stage. The maximum number of speakers to share a single stage was 5. I suspect this was a dance performance. Let’s have a look.

filter(df, num_speaker == 5) %>%
  select(title, description, main_speaker, event)
# A tibble: 1 x 4
                          title
                          <chr>
1 A dance to honor Mother Earth
# ... with 3 more variables: description <chr>, main_speaker <chr>, event <chr>

My hunch was correct. It is a talk titled A dance to honor Mother Earth by Jon Boogz and Lil Buck at the TED 2017 Conference.


TED Events

Which TED Events tend to hold the most number of TED.com upload worthy events? We will try to answer that question in this section.

count(df, event) %>%
  arrange(desc(n)) %>% head()
# A tibble: 6 x 2
    event     n
    <chr> <int>
1 TED2014    84
2 TED2009    83
3 TED2013    77
4 TED2016    77
5 TED2015    75
6 TED2011    70

As expected, the official TED events held the major share of TED Talks published on the TED.com platform. TED2014 had the most number of talks followed by TED2009. There isn’t too much insight to be gained from this.


TED Languages

One remarkable aspect of TED Talks is the sheer number of languages in which it is accessible. Let us perform some very basic data visualization and descriptive statistics about languages at TED.

summary(df$languages)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   23.00   28.00   27.33   33.00   72.00 

On average, a TED Talk is available in 27 different languages. The maximum number of languages a TED Talk is available in is a staggering 72. Let us check which talk this is.

filter(df, languages == 72)
# A tibble: 1 x 20
                                       name                         title
                                      <chr>                         <chr>
1 Matt Cutts: Try something new for 30 days Try something new for 30 days
# ... with 18 more variables: description <chr>, main_speaker <chr>, speaker_occupation <chr>,
#   num_speaker <int>, duration <int>, event <chr>, film_date <date>, published_date <date>, comments <int>,
#   tags <chr>, languages <int>, ratings <chr>, related_talks <chr>, url <chr>, views <int>, dis_quo <dbl>,
#   month <ord>, year <dbl>

The most translated TED Talk of all time is Matt Cutts’ Try Something New in 30 Days. The talk does have a very universal theme of exploration. The sheer number of languages it’s available in demands a little more inspection though as it has just over 8 million views, far fewer than the most popular TED Talks.

Finally, let us check if there is a correlation between the number of views and the number of languages a talk is available in. We would think that this should be the case since the talk is more accessible to a larger number of people but as Matt Cutts’ talk shows, it may not really be the case.

ggMarginal(
  ggplot(df, aes(x = languages, y = views)) + 
  geom_point()
)

cor(df[, c("languages","views")])
          languages     views
languages 1.0000000 0.3776231
views     0.3776231 1.0000000

The Pearson coefficient is 0.38 suggesting a medium correlation between the aforementioned quantities.


TED Themes

n this section, we will try to find out the most popular themes in the TED conferences. Although TED started out as a conference about technology, entertainment and design, it has since diversified into virtually every field of study and walk of life. It will be interesting to see if this conference with Silicon Valley origins has a bias towards certain topics.

To answer this question, we need to wrangle our data in a way that it is suitable for analysis. More specifically, we need to split the related_tags list into separate rows.

library(qdapRegex)
theme_df <- do.call("rbind", Map(function(title, theme) 
  data_frame(title = title, theme = theme),
  df$title, rm_between(df$tags, "'", "'", extract = TRUE))) %>%
  merge(df, by = "title")

There is one more theme “,” as compared to the referred notebook (not so relevant for the analysis anyway)

length(table(theme_df$theme))
[1] 417
pop_themes <- group_by(theme_df, theme) %>%
  count() %>%
  arrange(desc(n))
pop_themes
# A tibble: 417 x 2
# Groups:   theme [417]
           theme     n
           <chr> <int>
 1    technology   726
 2       science   564
 3 global issues   501
 4       culture   486
 5          TEDx   449
 6        design   418
 7      business   347
 8 entertainment   299
 9        health   230
10    innovation   229
# ... with 407 more rows
ggplot(head(pop_themes,10), aes(x = reorder(theme, n), y = n, fill = theme)) + 
    geom_bar(stat = "identity") + 
  guides(fill = FALSE) + labs(x = "") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

As may have been expected, Technology is the most popular topic for talks. The other two original factions, Design and Entertainment, also make it to the list of top 10 themes. Science and Global Issues are the second and the third most popular themes respectively.

The next question I want to answer is the trends in the share of topics of TED Talks across the world. Has the demand for Technology talks increased? Do certain years have a disproportionate share of talks related to global issues? Let’s find out!

We will only be considering the top 7 themes, excluding TEDx and talks after 2009, the year when the number of TED Talks really peaked.

pop_theme_talks <- theme_df %>% 
  filter(theme %in% head(pop_themes$theme, 8), 
         theme !="TEDx",
         year > 2008)

xtab_df <- pop_theme_talks %>%
  group_by(theme, year) %>%
  tally %>%
  group_by(year) %>% 
  mutate(prop = n/sum(n))
ggplot(xtab_df, aes(x = year, fill = theme, y = prop)) + 
  geom_bar(stat = 'identity', position = 'fill')

ggplot(xtab_df, aes(x = year, y= prop, group = theme, col = theme)) + 
  geom_line()

The proportion of technology talks has steadily increased over the years with a slight dip in 2010. This is understandable considering the boom of technologies such as blockchain, deep learning and augmented reality capturing people’s imagination.

Talks on culture have witnessed a dip, decreasing steadily starting 2013. The share of culture talks has been the least in 2017. Entertainment talks also seem to have witnessed a slight decline in popularity since 2009.

Like with the speaker occupations, let us investigate if certain topics tend to garner more views than certain other topics. We will be doing this analysis for the top ten categories that we discovered in an earlier cell. As with the speaker occupations, the box plot will be used to deduce this relation.

theme_df %>%
  filter(theme %in% head(pop_themes$theme, 10)) %>%
  ggplot(aes(x = theme, y = views, fill = theme)) + 
  geom_boxplot() + labs(x = "") +
  guides(fill = FALSE) + coord_cartesian(ylim = c(0, 0.4e7)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Although culture has lost its share in the number of TED Talks over the years, they garner the highest median number of views.


Talk Duration and Word Counts

In this section, we will perform analysis on the length of TED Talks. TED is famous for imposing a very strict time limit of 18 minutes. Although this is the suggested limit, there have been talks as short as 2 minutes and some have stretched to as long as 24 minutes. Let us get an idea of the distribution of TED Talk durations.

df$duration <- df$duration/60
summary(df$duration)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.250   9.617  14.133  13.775  17.446  87.600 

TED Talks, on average are 13.7 minutes long. I find this statistic surprising because TED Talks are often synonymous with 18 minutes and the average is a good 3 minutes shorter than that.

The shortest TED Talk on record is 2.25 minutes long whereas the longest talk is 87.6 minutes long. I’m pretty sure the longest talk was not actually a TED Talk. Let us look at both the shortest and the longest talk.

filter(df, duration == 2.25 | duration == 87.6)
# A tibble: 2 x 20
                                                 name                                title
                                                <chr>                                <chr>
1          Murray Gell-Mann: The ancestor of language             The ancestor of language
2 Douglas Adams: Parrots, the universe and everything Parrots, the universe and everything
# ... with 18 more variables: description <chr>, main_speaker <chr>, speaker_occupation <chr>,
#   num_speaker <int>, duration <dbl>, event <chr>, film_date <date>, published_date <date>, comments <int>,
#   tags <chr>, languages <int>, ratings <chr>, related_talks <chr>, url <chr>, views <int>, dis_quo <dbl>,
#   month <ord>, year <dbl>

The shortest talk was at TED2007 titled The ancestor of language by Murray Gell-Mann. The longest talk on TED.com, as we had guessed, is not a TED Talk at all. Rather, it was a talk titled Parrots, the universe and everything delivered by Douglas Adams at the University of California in 2001.

Let us now check for any correlation between the popularity and the duration of a TED Talk. To make sure we only include TED Talks, we will consider only those talks which have a duration less than 25 minutes.

ggMarginal(
  filter(df, duration < 25) %>%
  ggplot(aes(x = duration, y = views)) + geom_point()
)

There seems to be almost no correlation between these two quantities. This strongly suggests that there is no tangible correlation between the length and the popularity of a TED Talk. Content is king at TED.

Next, we look at transcripts to get an idea of word count. For this, we introduce our second dataset, the one which contains all transcripts.

There seems to be almost no correlation between these two quantities. This strongly suggests that there is no tangible correlation between the length and the popularity of a TED Talk. Content is king at TED.

Next, we look at transcripts to get an idea of word count. For this, we introduce our second dataset, the one which contains all transcripts.

df2 <- read_csv("data/transcripts.csv")
glimpse(df2)
Observations: 2,467
Variables: 2
$ transcript <chr> "Good morning. How are you?(Laughter)It's been great, hasn't it? I've been blown away ...
$ url        <chr> "https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity\n", "https://www....

It seems that we have data available for 2467 talks. Let us perform a join of the two dataframes on the url feature to include word counts for every talk.

df3 <- merge(df, df2, by = "url")
glimpse(df3)
Observations: 2,467
Variables: 21
$ url                <chr> "https://www.ted.com/talks/9_11_healing_the_mothers_who_found_forgiveness_frie...
$ name               <chr> "Aicha el-Wafi + Phyllis Rodriguez: The mothers who found forgiveness, friends...
$ title              <chr> "The mothers who found forgiveness, friendship", "My year of living biblically...
$ description        <chr> "Phyllis Rodriguez and Aicha el-Wafi have a powerful friendship born of unthin...
$ main_speaker       <chr> "Aicha el-Wafi + Phyllis Rodriguez", "AJ Jacobs", "Markus Fischer", "Improv Ev...
$ speaker_occupation <chr> "9/11 mothers", "Author", "Designer", "Social energy entrepreneur", "Whistler"...
$ num_speaker        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, ...
$ duration           <dbl> 9.900000, 17.666667, 6.316667, 3.816667, 11.933333, 9.833333, 14.266667, 15.45...
$ event              <chr> "TEDWomen 2010", "EG 2007", "TEDGlobal 2011", "TED2012", "TEDxRotterdam 2010",...
$ film_date          <date> 2010-12-12, 2007-12-02, 2011-07-15, 2012-03-01, 2010-06-04, 2014-10-21, 2016-...
$ published_date     <date> 2011-05-02, 2008-07-17, 2011-07-22, 2012-03-09, 2011-02-11, 2014-12-05, 2017-...
$ comments           <int> 149, 583, 440, 324, 93, 48, 36, 850, 79, 333, 104, 194, 95, 124, 231, 56, 29, ...
$ tags               <chr> "['culture', 'friendship', 'global issues', 'parenting', 'terrorism']", "['com...
$ languages          <int> 32, 39, 45, 51, 31, 39, 20, 36, 27, 30, 26, 26, 28, 28, 33, 28, 19, 20, 23, 24...
$ ratings            <chr> "[{'id': 10, 'name': 'Inspiring', 'count': 385}, {'id': 1, 'name': 'Beautiful'...
$ related_talks      <chr> "[{'id': 968, 'hero': 'https://pe.tedcdn.com/images/ted/202850_800x600.jpg', '...
$ views              <int> 820976, 2291701, 6264902, 2950307, 1917442, 817014, 896491, 1347633, 1474192, ...
$ dis_quo            <dbl> 1.814913e-04, 2.543962e-04, 7.023254e-05, 1.098191e-04, 4.850212e-05, 5.875052...
$ month              <ord> Dec, Dec, Jul, Mar, Jun, Oct, Feb, Sep, Mar, Mar, Mar, Jun, Jun, Feb, Jul, May...
$ year               <dbl> 2010, 2007, 2011, 2012, 2010, 2014, 2016, 2010, 2011, 2011, 2015, 2013, 2016, ...
$ transcript         <chr> "Phyllis Rodriguez: We are here today because of the fact that we have what mo...
df3$wc <- sapply(df3$transcript, function(x)
  length(strsplit(x, split="\\s+")[[1]]))
summary(df3$wc)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1    1332    2028    2040    2707    9044 

We can see that the average TED Talk has around 1971 words and there is a significantly large standard deviation of a 1009 words. The longest talk is more than 9044 words in length.

Like duration, there shouldn’t be any correlation between number of words and views. We will proceed to look at a more interesting statstic: the number of words per minute.

df3$wpm = df3$wc/df3$duration
summary(df3$wpm)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
  0.08086 133.27105 149.95733 147.10305 165.27871 247.36486 

The average TED Speaker enunciates 142 words per minute. The fastest talker spoke a staggering 247 words a minute which is much higher than the average of 125-150 words per minute in English. Let us see who this is!

filter(df3, wpm > 245)
                                                                             url
1 https://www.ted.com/talks/mae_jemison_on_teaching_arts_and_sciences_together\n
                                           name                            title
1 Mae Jemison: Teach arts and sciences together Teach arts and sciences together
                                                                                                                                                                                                                                                        description
1 Mae Jemison is an astronaut, a doctor, an art collector, a dancer ... Telling stories from her own education and from her time in space, she calls on educators to teach both the arts and sciences, both intuition and logic, as one -- to create bold thinkers.
  main_speaker                                        speaker_occupation num_speaker duration   event
1  Mae Jemison Astronaut, engineer, entrepreneur, physician and educator           1     14.8 TED2002
   film_date published_date comments
1 2002-02-02     2009-05-05       99
                                                                                          tags languages
1 ['art', 'dance', 'education', 'future', 'science', 'science and art', 'space', 'technology']        20
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ratings
1 [{'id': 24, 'name': 'Persuasive', 'count': 126}, {'id': 10, 'name': 'Inspiring', 'count': 243}, {'id': 22, 'name': 'Fascinating', 'count': 83}, {'id': 25, 'name': 'OK', 'count': 86}, {'id': 26, 'name': 'Obnoxious', 'count': 17}, {'id': 21, 'name': 'Unconvincing', 'count': 67}, {'id': 8, 'name': 'Informative', 'count': 107}, {'id': 11, 'name': 'Longwinded', 'count': 68}, {'id': 3, 'name': 'Courageous', 'count': 42}, {'id': 9, 'name': 'Ingenious', 'count': 34}, {'id': 2, 'name': 'Confusing', 'count': 10}, {'id': 7, 'name': 'Funny', 'count': 14}, {'id': 1, 'name': 'Beautiful', 'count': 55}, {'id': 23, 'name': 'Jaw-dropping', 'count': 18}]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     related_talks
1 [{'id': 66, 'hero': 'https://pe.tedcdn.com/images/ted/6b6eb940bceab359ca676a9b486aae475c1df883_2880x1620.jpg', 'speaker': 'Ken Robinson', 'title': 'Do schools kill creativity?', 'duration': 1164, 'slug': 'ken_robinson_says_schools_kill_creativity', 'viewed_count': 47227861}, {'id': 1571, 'hero': 'https://pe.tedcdn.com/images/ted/d538403350630eeaf965325257caf869350a9832_1600x1200.jpg', 'speaker': 'John Maeda', 'title': 'How art, technology and design inform creative leaders', 'duration': 1001, 'slug': 'john_maeda_how_art_technology_and_design_inform_creative_leaders', 'viewed_count': 1045694}, {'id': 1657, 'hero': 'https://pe.tedcdn.com/images/ted/f31f296ffb6902d226e403e6431713cf37629b55_1600x1200.jpg', 'speaker': 'Mitch Resnick', 'title': "Let's teach kids to code", 'duration': 1008, 'slug': 'mitch_resnick_let_s_teach_kids_to_code', 'viewed_count': 1724947}, {'id': 952, 'hero': 'https://pe.tedcdn.com/images/ted/197744_800x600.jpg', 'speaker': 'Ben Cameron', 'title': 'Why the live arts matter', 'duration': 764, 'slug': 'ben_cameron_tedxyyc', 'viewed_count': 497709}, {'id': 1653, 'hero': 'https://pe.tedcdn.com/images/ted/e562bc9bf7daf7d4f06450eff3d5af17a5e873f8_2880x1620.jpg', 'speaker': 'Young-ha Kim', 'title': 'Be an artist, right now!', 'duration': 1017, 'slug': 'young_ha_kim_be_an_artist_right_now', 'viewed_count': 1841666}, {'id': 1747, 'hero': 'https://pe.tedcdn.com/images/ted/910102e6486442bc30fbc5952c254a9f9882942f_1600x1200.jpg', 'speaker': 'Phil Hansen', 'title': 'Embrace the shake', 'duration': 601, 'slug': 'phil_hansen_embrace_the_shake', 'viewed_count': 2155393}]
   views      dis_quo month year
1 744257 0.0001330186   Feb 2002
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 transcript
1 What I want to do today is to spend some time talking about some stuff that's sort of giving me a little bit of existential angst, for lack of a better word, over the past couple of years, and basically, these three quotes tell what's going on. "When God made the color purple, God was just showing off," Alice Walker wrote in "The Color Purple," and Zora Neale Hurston wrote in "Dust Tracks On A Road," "Research is a formalized curiosity. It's poking and prying with a purpose." And then finally, when I think about the near future, you know, we have this attitude, well, whatever happens, happens. Right? So that goes along with the Chesire Cat saying, "If you don't care much where you want to get to, it doesn't much matter which way you go." But I think it does matter which way we go, and what road we take, because when I think about design in the near future, what I think are the most important issues, what's really crucial and vital is that we need to revitalize the arts and sciences right now in 2002. (Applause) If we describe the near future as 10, 20, 15 years from now, that means that what we do today is going to be critically important, because in the year 2015, and the year 2020, 2025, the world our society is going to be building on, the basic knowledge and abstract ideas, the discoveries that we came up with today, just as all these wonderful things we're hearing about here at the TED conference that we take for granted in the world right now, were really knowledge and ideas that came up in the '50s, the '60s, and the '70s. That's the substrate that we're exploiting today, whether it's the internet, genetic engineering, laser scanners, guided missiles, fiber optics, high-definition television, sensing, remote-sensing from space and the wonderful remote-sensing photos that we see in 3D weaving, TV programs like Tracker, and Enterprise, CD rewrite drives, flatscreen, Alvin Ailey's Suite Otis, or Sarah Jones' "Your Revolution Will Not Be Between These Thighs," which by the way was banned by the FCC, or ska, all of these things without question, almost without exception, are really based on ideas and abstract and creativity from years before, so we have to ask ourselves, what are we contributing to that legacy right now? And when I think about it, I'm really worried. To be quite frank, I'm concerned. I'm skeptical that we're doing very much of anything. We're, in a sense, failing to act in the future. We're purposefully, consciously being laggards. We're lagging behind. Frantz Fanon, who was a psychiatrist from Martinique, said, "Each generation must, out of relative obscurity, discover its mission, and fulfill or betray it." What is our mission? What do we have to do? I think our mission is to reconcile, to reintegrate science and the arts, because right now there's a schism that exists in popular culture. You know, people have this idea that science and the arts are really separate. We think of them as separate and different things, and this idea was probably introduced centuries ago, but it's really becoming critical now, because we're making decisions about our society every day that, if we keep thinking that the arts are separate from the sciences, and we keep thinking it's cute to say, "I don't understand anything about this one, I don't understand anything about the other one," then we're going to have problems. Now I know no one here at TED thinks this. All of us, we already know that they're very connected, but I'm going to let you know that some folks in the outside world, believe it or not, they think it's neat when they say, "You know, scientists and science is not creative. Maybe scientists are ingenious, but they're not creative. And then we have this tendency, the career counselors and various people say things like, "Artists are not analytical. They're ingenious, perhaps, but not analytical," and when these concepts underly our teaching and what we think about the world, then we have a problem, because we stymie support for everything. By accepting this dichotomy, whether it's tongue-in-cheek, when we attempt to accommodate it in our world, and we try to build our foundation for the world, we're messing up the future, because, who wants to be uncreative? Who wants to be illogical? Talent would run from either of these fields if you said you had to choose either. Then they're going to go to something where they think, "Well, I can be creative and logical at the same time." Now I grew up in the '60s and I'll admit it, actually, my childhood spanned the '60s, and I was a wannabe hippie and I always resented the fact that I wasn't really old enough to be a hippie. And I know there are people here, the younger generation who want to be hippies, but people talk about the '60s all the time, and they talk about the anarchy that was there, but when I think about the '60s, what I took away from it was that there was hope for the future. We thought everyone could participate. There were wonderful, incredible ideas that were always percolating, and so much of what's cool or hot today is really based on some of those concepts, whether it's, you know, people trying to use the prime directive from Star Trek being involved in things, or again that three-dimensional weaving and fax machines that I read about in my weekly readers that the technology and engineering was just getting started. But the '60s left me with a problem. You see, I always assumed I would go into space, because I followed all of this, but I also loved the arts and sciences. You see, when I was growing up as a little girl and as a teenager, I loved designing and making dogs' clothes and wanting to be a fashion designer. I took art and ceramics. I loved dance. Lola Falana. Alvin Ailey. Jerome Robbins. And I also avidly followed the Gemini and the Apollo programs. I had science projects and tons of astronomy books. I took calculus and philosophy. I wondered about the infinity and the Big Bang theory. And when I was at Stanford, I found myself, my senior year, chemical engineering major, half the folks thought I was a political science and performing arts major, which was sort of true because I was Black Student Union President and I did major in some other things, and I found myself the last quarter juggling chemical engineering separation processes, logic classes, nuclear magnetic resonance spectroscopy, and also producing and choreographing a dance production, and I had to do the lighting and the design work, and I was trying to figure out, do I go to New York City to try to become a professional dancer, or do I go to medical school? Now, my mother helped me figure that one out. (Laughter) But when I went into space, when I went into space I carried a number of things up with me. I carried a poster by Alvin Ailey, which you can figure out now, I love the dance company. An Alvin Ailey poster of Judith Jamison performing the dance "Cry," dedicated to all black women everywhere. A Bundu statue, which was from the Women's Society in Sierra Leone, and a certificate for the Chicago Public School students to work to improve their science and math, and folks asked me, "Why did you take up what you took up?" And I had to say, "Because it represents human creativity, the creativity that allowed us, that we were required to have to conceive and build and launch the space shuttle, springs from the same source as the imagination and analysis it took to carve a Bundu statue, or the ingenuity it took to design, choreograph, and stage "Cry." Each one of them are different manifestations, incarnations, of creativity, avatars of human creativity, and that's what we have to reconcile in our minds, how these things fit together. The difference between arts and sciences is not analytical versus intuitive, right? E=MC squared required an intuitive leap, and then you had to do the analysis afterwards. Einstein said, in fact, "The most beautiful thing we can experience is the mysterious. It is the source of all true art and science." Dance requires us to express and want to express the jubilation in life, but then you have to figure out, exactly what movement do I do to make sure that it comes across correctly? The difference between arts and sciences is also not constructive versus deconstructive, right? A lot of people think of the sciences as deconstructive. You have to pull things apart. And yeah, sub-atomic physics is deconstructive. You literally try to tear atoms apart to understand what's inside of them. But sculpture, from what I understand from great sculptors, is deconstructive, because you see a piece and you remove what doesn't need to be there. Biotechnology is constructive. Orchestral arranging is constructive. So in fact we use constructive and deconstructive techniques in everything. The difference between science and the arts is not that they are different sides of the same coin, even, or even different parts of the same continuum, but rather they're manifestations of the same thing. Different quantum states of an atom? Or maybe if I want to be more 21st century I could say that they are different harmonic resonances of a superstring. But we'll leave that alone. (Laughter) They spring from the same source. The arts and sciences are avatars of human creativity. It's our attempt as humans to build an understanding of the universe, the world around us. It's our attempt to influence things, the universe internal to ourselves and external to us. The sciences, to me, are manifestations of our attempt to express or share our understanding, our experience, to influence the universe external to ourselves. It doesn't rely on us as individuals. It's the universe, as experienced by everyone, and the arts manifest our desire, our attempt to share or influence others through experiences that are peculiar to us as individuals. Let me say it again another way: science provides an understanding of a universal experience, and arts provides a universal understanding of a personal experience. That's what we have to think about, that they're all part of us, they're all part of a continuum. It's not just the tools, it's not just the sciences, you know, the mathematics and the numerical stuff and the statistics, because we heard, very much on this stage, people talked about music being mathematical. Right? Arts don't just use clay, aren't the only ones that use clay, light and sound and movement. They use analysis as well. So people might say, well, I still like that intuitive versus analytical thing, because everybody wants to do the right brain, left brain thing, right? We've all been accused of being right-brained or left-brained at some point in time, depending on who we disagreed with. (Laughter) You know, people say intuitive, you know that's like you're in touch with nature, in touch with yourself and relationships. Analytical: you put your mind to work, and I'm going to tell you a little secret. You all know this though, but sometimes people use this analysis idea, that things are outside of ourselves, to be, say, that this is what we're going to elevate as the true, most important sciences, right? And then you have artists, and you all know this is true as well, artists will say things about scientists because they say they're too concrete, they're disconnected with the world. But, we've even had that here on stage, so don't act like you don't know what I'm talking about. (Laughter) We had folks talking about the Flat Earth Society and flower arrangers, so there's this whole dichotomy that we continue to carry along, even when we know better. And folks say we need to choose either or. But it would really be foolish to choose either one, right? Intuitive versus analytical? That's a foolish choice. It's foolish, just like trying to choose between being realistic or idealistic. You need both in life. Why do people do this? I'm just gonna quote a molecular biologist, Sydney Brenner, who's 70 years old so he can say this. He said, "It's always important to distinguish between chastity and impotence." Now... (Laughter) I want to share with you a little equation, okay? How do understanding science and the arts fit into our lives and what's going on and the things that we're talking about here at the design conference, and this is a little thing I came up with, understanding and our resources and our will cause us to have outcomes. Our understanding is our science, our arts, our religion, how we see the universe around us, our resources, our money, our labor, our minerals, those things that are out there in the world we have to work with. But more importantly, there's our will. This is our vision, our aspirations of the future, our hopes, our dreams, our struggles and our fears. Our successes and our failures influence what we do with all of those, and to me, design and engineering, craftsmanship and skilled labor, are all the things that work on this to have our outcome, which is our human quality of life. Where do we want the world to be? And guess what? Regardless of how we look at this, whether we look at arts and sciences are separate or different, they're both being influenced now and they're both having problems. I did a project called S.E.E.ing the Future: Science, Engineering and Education, and it was looking at how to shed light on most effective use of government funding. We got a bunch of scientists in all stages of their careers. They came to Dartmouth College, where I was teaching, and they talked about with theologians and financiers, what are some of the issues of public funding for science and engineering research? What's most important about it? There are some ideas that emerged that I think have really powerful parallels to the arts. The first thing they said was that the circumstances that we find ourselves in today in the sciences and engineering that made us world leaders is very different than the '40s, the '50s, and the '60s and the '70s when we emerged as world leaders, because we're no longer in competition with fascism, with Soviet-style communism, and by the way that competition wasn't just military, it included social competition and political competition as well, that allowed us to look at space as one of those platforms to prove that our social system was better. Another thing they talked about was the infrastructure that supports the sciences is becoming obsolete. We look at universities and colleges, small, mid-sized community colleges across the country, their laboratories are becoming obsolete, and this is where we train most of our science workers and our researchers, and our teachers, by the way, and then that there's a media that doesn't support the dissemination of any more than the most mundane and inane of information. There's pseudo-science, crop circles, alien autopsy, haunted houses, or disasters. And that's what we see. And this isn't really the information you need to operate in everyday life and figure out how to participate in this democracy and determine what's going on. They also said that there's a change in the corporate mentality. Whereas government money had always been there for basic science and engineering research, we also counted on some companies to do some basic research, but what's happened now is companies put more energy into short-term product development than they do in basic engineering and science research. And education is not keeping up. In K through 12, people are taking out wet labs. They think if we put a computer in the room it's going to take the place of actually, we're mixing the acids, we're growing the potatoes. And government funding is decreasing in spending and then they're saying, let's have corporations take over, and that's not true. Government funding should at least do things like recognize cost-benefits of basic science and engineering research. We have to know that we have a responsibility as global citizens in this world. We have to look at the education of humans. We need to build our resources today to make sure that they're trained so that they understand the importance of these things, and we have to support the vitality of science, and that doesn't mean that everything has to have one thing that's going to go on, or we know exactly what's going to be the outcome of it, but that we support the vitality and the intellectual curiosity that goes along, and if you think about those parallels to the arts, the competition with the Bolshoi Ballet spurred the Joffrey and the New York City Ballet to become better. Infrastructure museums, theaters, movie houses across the country are disappearing. We have more television stations with less to watch, we have more money spent on rewrites to get old television programs in the movies. We have corporate funding now that, when it goes to some company, when it goes to support the arts, it almost requires that the product be part of the picture that the artist draws, and we have stadiums that are named over and over again by corporations. In Houston, we're trying to figure out what to do with that Enron Stadium thing. (Laughter) And fine arts and education in the schools is disappearing, and we have a government that seems like it's gutting the NEA and other programs, so we have to really stop and think, what are we trying to do with the sciences and the arts? There's a need to revitalize them. We have to pay attention to it. I just want to tell you really quickly what I'm doing. (Applause) I want to tell you what I've been doing a little bit since... I feel this need to sort of integrate some of the ideas that I've had and run across over time. One of the things that I found out is that there's a need to repair the dichotomy between the mind and body as well. My mother always told me, you have to be observant, know what's going on in your mind and your body, and as a dancer I had this tremendous faith in my ability to know my body, just as I knew how to sense colors. Then I went to medical school, and I was supposed to just go on what the machine said about bodies. You know, you would ask patients questions and some people would tell you, "Don't, don't, don't listen to what the patients said." We know that patients know and understand their bodies better, but these days we're trying to divorce them from that idea. We have to reconcile the patient's knowledge of their body with physician's measurements. We had someone talk about measuring emotions and getting machines to figure out what, to keep us from acting crazy. Right? No, we shouldn't measure, we shouldn't use machines to measure road rage and then do something to keep us from engaging in it. Maybe we can have machines help us to recognize that we have road rage and then we need to know how to control that without the machines. We even need to be able to recognize that without the machines. What I'm very concerned about is how do we bolster our self-awareness as humans, as biological organisms? Michael Moschen spoke of having to teach and learn how to feel with my eyes, to see with my hands. We have all kinds of possibilities to use our senses by, and that's what we have to do. That's what I want to do, is to try to use bioinstrumentation, those kind of things to help our senses in what we do, and that's the work I've been doing now as a company called BioSentient Corporation. I figured I'd have to do that ad, because I'm an entrepreneur, because entrepreneur says that that's somebody who does what they want to do because they're not broke enough that they have to get a real job. (Laughter) But that's the work I'm doing with BioSentient Corporation trying to figure out how do we integrate these things? Let me finish by saying that my personal design issue for the future is really about integrating, to think about that intuitive and that analytical. The arts and sciences are not separate. High school physics lesson before you leave. High school physics teacher used to hold up a ball. She would say this ball has potential energy, but nothing will happen to it, it can't do any work until I drop it and it changes states. I like to think of ideas as potential energy. They're really wonderful, but nothing will happen until we risk putting them into action. This conference is filled with wonderful ideas. We're going to share lots of things with people, but nothing's going to happen until we risk putting those ideas into action. We need to revitalize the arts and sciences of today, we need to take responsibility for the future. We can't hide behind saying it's just for company profits, or it's just a business, or I'm an artist or an academician. Here's how you judge what you're doing. I talked about that balance between intuitive, analytical. Fran Lebowitz, my favorite cynic, she said the three questions of greatest concern, now I'm going to add on to design, is, "Is it attractive?" That's the intuitive. "Is it amusing?" The analytical. "And does it know its place?" The balance. Thank you very much. (Applause)
    wc      wpm
1 3661 247.3649

The person is Mae Jemison with a talk on Teach arts and sciences together at the TED2002 conference. We should take this result with a pinch of salt because I went ahead and had a look at the talk and she didn’t really seem to speak that fast.

Finally, in this section, I’d like to see if there is any correlation between words per minute and popularity.

ggMarginal(
  ggplot(filter(df3, duration<25), aes(x = wpm, y = views)) + geom_point()
)

cor(df3[, c("wpm","views")])
             wpm      views
wpm   1.00000000 0.01311274
views 0.01311274 1.00000000

Again, there is practically no correlation. If you are going to give a TED Talk, you probably shouldn’t worry if you’re speaking a little faster or a little slower than usual.


TED Ratings

TED allows its users to rate a particular talk on a variety of metrics. We therefore have data on how many people found a particular talk funny, inspiring, creative and a myriad of other verbs. Let us inspect how this ratings dictionary actually looks like.

df[2, 'ratings']
# A tibble: 1 x 1
                                                                                                         ratings
                                                                                                           <chr>
1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'id': 3, 'name': 'Courageous', 'count': 139}, {'id': 2, 'name': 'C
rat_levels <- c(`1` = "Beautiful", `2` = "Confusing", `3` = "Courageous", 
                `7` = "Funny", `8` = "Informative", `9` = "Ingenious", 
                `10` = "Inspiring", `11` = "Longwinded", `21` = "Unconvincing", 
                `22` = "Fascinating", `23` = "Jaw-dropping", `24` = "Persuasive", 
                `25` = "OK", `26` = "Obnoxious")

library(RJSONIO)
rating_list <- lapply(df$ratings, function(s)
  do.call("rbind", lapply(fromJSON(s), function(y) do.call("cbind", y))) %>%
  as.data.frame() %>%
  mutate(rating = rat_levels[as.character(id)])
)
rating_df <- do.call("rbind", Map(function(title, rating) 
  data.frame(title, rating, stringsAsFactors = FALSE),
  df$title, rating_list)) %>%
  merge(df, by = "title")

Funniest Talks of all time

rating_df %>%
  filter(rating == "Funny") %>%
  select(title, main_speaker, views, published_date, count) %>%
  group_by(title) %>%
  arrange(desc(count))
# A tibble: 2,550 x 5
# Groups:   title [2,550]
                                               title       main_speaker    views published_date count
                                               <chr>              <chr>    <int>         <date> <dbl>
 1                       Do schools kill creativity?       Ken Robinson 47227110     2006-06-27 19645
 2 This is what happens when you reply to spam email       James Veitch 20475972     2016-01-08  7731
 3        Inside the mind of a master procrastinator          Tim Urban 14745406     2016-03-15  7445
 4                   The happy secret to better work        Shawn Achor 16209727     2012-02-01  7315
 5 Lies, damned lies and statistics (about TEDTalks) Sebastian Wernicke  2212944     2010-04-30  5552
 6                        The power of vulnerability        Brené Brown 31168150     2010-12-23  5225
 7            10 things you didn't know about orgasm         Mary Roach 22270883     2009-05-20  4166
 8                      "It's time for \"The Talk\""      Julia Sweeney  3362099     2010-05-14  4025
 9  Did you hear the one about the Iranian-American?        Maz Jobrani  4646183     2010-08-19  4013
10                 Bring on the learning revolution!       Ken Robinson  7266316     2010-05-24  3000
# ... with 2,540 more rows

Most Beautiful Talks of all time

rating_df %>%
  filter(rating == "Beautiful") %>%
  select(title, main_speaker, views, published_date, count) %>%
  group_by(title) %>%
  arrange(desc(count))
# A tibble: 2,550 x 5
# Groups:   title [2,550]
                                         title             main_speaker    views published_date count
                                         <chr>                    <chr>    <int>         <date> <dbl>
 1                        My stroke of insight        Jill Bolte Taylor 21190883     2008-03-12  9437
 2                  The power of vulnerability              Brené Brown 31168150     2010-12-23  7942
 3                  Building a park in the sky           Robert Hammond   704205     2011-06-30  6685
 4 The transformative power of classical music          Benjamin Zander  9315483     2008-06-25  5967
 5                The danger of a single story Chimamanda Ngozi Adichie 13298341     2009-10-07  5607
 6                    Underwater astonishments              David Gallo 13926113     2008-01-11  5201
 7                 Do schools kill creativity?             Ken Robinson 47227110     2006-06-27  4573
 8             If I should have a daughter ...                Sarah Kay 10529854     2011-03-18  4430
 9                  Nature. Beauty. Gratitude.       Louie Schwartzberg  3658158     2012-11-22  4399
10                Your elusive creative genius        Elizabeth Gilbert 13155478     2009-02-09  4027
# ... with 2,540 more rows

Most Jawdropping Talks of all time

rating_df %>%
  filter(rating == "Jaw-dropping") %>%
  select(title, main_speaker, views, published_date, count) %>%
  group_by(title) %>%
  arrange(desc(count))
# A tibble: 2,550 x 5
# Groups:   title [2,550]
                                              title          main_speaker    views published_date count
                                              <chr>                 <chr>    <int>         <date> <dbl>
 1    How PhotoSynth can connect the world's images Blaise Agüera y Arcas  4772595     2007-05-27 14728
 2                             My stroke of insight     Jill Bolte Taylor 21190883     2008-03-12 10464
 3 The thrilling potential of SixthSense technology         Pranav Mistry 16097077     2009-11-16  8416
 4                         Underwater astonishments           David Gallo 13926113     2008-01-11  8328
 5                "A performance of \"Mathemagic\""       Arthur Benjamin  8360707     2007-12-13  7196
 6                          New insights on poverty          Hans Rosling  3243784     2007-06-25  5137
 7                                   This is Saturn         Carolyn Porco  2627709     2007-10-01  4971
 8 The radical promise of the multi-touch interface              Jeff Han  4531020     2006-08-01  4643
 9                      Do schools kill creativity?          Ken Robinson 47227110     2006-06-27  4439
10                  The best stats you've ever seen          Hans Rosling 12005869     2006-06-27  3736
# ... with 2,540 more rows

Most Confusing Talks of all time

rating_df %>%
  filter(rating == "Confusing") %>%
  select(title, main_speaker, views, published_date, count) %>%
  group_by(title) %>%
  arrange(desc(count))
# A tibble: 2,550 x 5
# Groups:   title [2,550]
                                    title      main_speaker    views published_date count
                                    <chr>             <chr>    <int>         <date> <dbl>
 1 I believe we evolved from aquatic apes     Elaine Morgan  1038576     2009-07-31   531
 2 An 8-dimensional model of the universe      Garrett Lisi  1491698     2008-10-14   376
 3                   Why we do what we do      Tony Robbins 20685401     2006-06-27   301
 4                   My stroke of insight Jill Bolte Taylor 21190883     2008-03-12   289
 5                      The call to learn    Clifford Stoll  2283491     2008-03-26   278
 6                     Design and destiny   Philippe Starck  1783740     2007-12-04   276
 7                            Brain magic       Keith Barry 13327101     2008-07-18   273
 8  17 words of architectural inspiration  Daniel Libeskind   784642     2009-07-01   244
 9            Do schools kill creativity?      Ken Robinson 47227110     2006-06-27   242
10    The surprising science of happiness       Dan Gilbert 14689301     2006-09-26   241
# ... with 2,540 more rows

The TED Word Cloud

I was curious about which words are most often used by TED Speakers. Could we create a Word Cloud out of all TED Speeches?

I tried to to create a word cloud following the tutorial on r-bloggers

library(tm)
library(SnowballC)
library(wordcloud)

texts <- df3$transcript
corpus <- Corpus(VectorSource(texts)) %>%
  tm_map(PlainTextDocument) %>%
  tm_map(removePunctuation) %>%
  tm_map(removeWords, stopwords('english')) %>%
  tm_map(stemDocument) %>%
  tm_map(removeWords, c("and", "this", "there"))

corpus <- Corpus(VectorSource(corpus)) 
m <- TermDocumentMatrix(corpus) %>% as.matrix()
v <- sort(rowSums(m), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v) %>%
  filter(!(word %in% c("and","this","that")))
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words = 200, random.order = FALSE, rot.per = 0.35, 
          colors = brewer.pal(8, "Dark2"))

An interactive alternative

library(wordcloud2)
wordcloud2(data = d)
# Why not?
letterCloud(d, word = "TED")