Tomer Eldor January 26, 2018
Have you ever wondered what makes some TED talks more popular than others? Well, I’ve analyzed a dataset of 2550 ted talks to get some answers for this question. May aim was to explore which of my available variables of a given talk, such as the number of comments, number of languages translated, duration of the talk, number of tags, or day it was published online– are a strong predictor of its popularity, measured in number of views. I don’t believe these analyses serve as a good causal inference, since results wouldn’t be matched with these variables, their explanatory power isn’t rigorous enough. The available numerical parameters I had in hand are not sufficient for that kind of a conclusion; I couldn’t match the content that really matters to compare apples to apples, and even with controlling with multiple regression – not all things are equal (ceteris paribus assumption is still not met). However, I was able to get a decent predictor and understand which variables are most strongly associated with higher view counts. What data did I have? The dataset includes the name, title, description and URL of each of the 2550 talks, name and occupation of the main speaker, number of speakers, duration of the talk, the TED event and date it was filmed at, date it was published online, number of comments, languages translated and views. It also includes data points as the array of associated tags, ratings, and related talks, but as inside arrays and these need transformations before they can be used. For a full list, see comment.
library(ggplot2) # Data visualisation
library(reshape2)
library(corrplot)
library(dplyr)
library(stringr) # String manipulation
library(anytime)
library(data.table)
ted = read.csv("./data/ted_main.csv",header=TRUE,stringsAsFactors = TRUE)
transcripts=read.csv("./data/transcripts.csv",header=TRUE,stringsAsFactors = FALSE)
Here are some in a Summary Statistics - a few tables and appropriate graphs that describe the data.
Let’s examine the distributions of the key parameters that we’ll be using. To the right are histograms of most of our numerical variables. Below are some more detailed histograms, with a white line for the median and blue line for the mean. ¬¬
print(summary(ted))
## comments
## Min. : 2.0
## 1st Qu.: 63.0
## Median : 118.0
## Mean : 191.6
## 3rd Qu.: 221.8
## Max. :6404.0
##
## description
## 'I am a mathematician, and I would like to stand on your roof.' That is how Ron Eglash greeted many African families he met while researching the fractal patterns he'd noticed in villages across the continent. : 1
## "A forest is much more than what you see," says ecologist Suzanne Simard. Her 30 years of research in Canadian forests have led to an astounding discovery -- trees talk, often and over vast distances. Learn more about the harmonious yet complicated social lives of trees and prepare to see the natural world with new eyes. : 1
## "An advanced city is not one where even the poor use cars, but rather one where even the rich use public transport," argues Enrique Peñalosa. In this spirited talk, the mayor of Bogotá shares some of the tactics he used to change the transportation dynamic in the Colombian capital... and suggests ways to think about building smart cities of the future. : 1
## "Anything that is worth pursuing is going to require us to suffer, just a little bit," says surf photographer Chris Burkard, as he explains his obsession with the coldest, choppiest, most isolated beaches on earth. With jawdropping photos and stories of places few humans have ever seen -- much less surfed -- he draws us into his "personal crusade against the mundane." : 1
## "Architecture is not about math or zoning -- it's about visceral emotions," says Marc Kushner. In a sweeping — often funny — talk, he zooms through the past thirty years of architecture to show how the public, once disconnected, have become an essential part of the design process. With the help of social media, feedback reaches architects years before a building is even created. The result? Architecture that will do more for us than ever before.: 1
## "Babies and young children are like the R&D division of the human species," says psychologist Alison Gopnik. Her research explores the sophisticated intelligence-gathering and decision-making that babies are really doing when they play. : 1
## (Other) :2544
## duration event film_date languages
## Min. : 135.0 TED2014: 84 Min. :7.465e+07 Min. : 0.00
## 1st Qu.: 577.0 TED2009: 83 1st Qu.:1.257e+09 1st Qu.:23.00
## Median : 848.0 TED2013: 77 Median :1.333e+09 Median :28.00
## Mean : 826.5 TED2016: 77 Mean :1.322e+09 Mean :27.33
## 3rd Qu.:1046.8 TED2015: 75 3rd Qu.:1.413e+09 3rd Qu.:33.00
## Max. :5256.0 TED2011: 70 Max. :1.504e+09 Max. :72.00
## (Other):2084
## main_speaker
## Hans Rosling : 9
## Juan Enriquez: 7
## Marco Tempest: 6
## Rives : 6
## Bill Gates : 5
## Clay Shirky : 5
## (Other) :2512
## name
## Aakash Odedra: A dance in a hurricane of paper, wind and light : 1
## Aala El-Khani: What it's like to be a parent in a war zone : 1
## Aaron Huey: America's native prisoners of war : 1
## Aaron Koblin: Visualizing ourselves ... with crowd-sourced data : 1
## Aaron O'Connell: Making sense of a visible quantum object : 1
## Abe Davis: New video technology that reveals an object's hidden properties: 1
## (Other) :2544
## num_speaker published_date
## Min. :1.000 Min. :1.151e+09
## 1st Qu.:1.000 1st Qu.:1.268e+09
## Median :1.000 Median :1.341e+09
## Mean :1.028 Mean :1.344e+09
## 3rd Qu.:1.000 3rd Qu.:1.423e+09
## Max. :5.000 Max. :1.506e+09
##
## ratings
## [{'id': 1, 'name': 'Beautiful', 'count': 100}, {'id': 9, 'name': 'Ingenious', 'count': 97}, {'id': 3, 'name': 'Courageous', 'count': 78}, {'id': 8, 'name': 'Informative', 'count': 313}, {'id': 10, 'name': 'Inspiring', 'count': 518}, {'id': 21, 'name': 'Unconvincing', 'count': 30}, {'id': 22, 'name': 'Fascinating', 'count': 158}, {'id': 11, 'name': 'Longwinded', 'count': 31}, {'id': 7, 'name': 'Funny', 'count': 12}, {'id': 25, 'name': 'OK', 'count': 59}, {'id': 23, 'name': 'Jaw-dropping', 'count': 72}, {'id': 24, 'name': 'Persuasive', 'count': 174}, {'id': 26, 'name': 'Obnoxious', 'count': 14}, {'id': 2, 'name': 'Confusing', 'count': 15}] : 1
## [{'id': 1, 'name': 'Beautiful', 'count': 101}, {'id': 3, 'name': 'Courageous', 'count': 141}, {'id': 10, 'name': 'Inspiring', 'count': 252}, {'id': 11, 'name': 'Longwinded', 'count': 20}, {'id': 8, 'name': 'Informative', 'count': 92}, {'id': 22, 'name': 'Fascinating', 'count': 36}, {'id': 2, 'name': 'Confusing', 'count': 15}, {'id': 24, 'name': 'Persuasive', 'count': 39}, {'id': 21, 'name': 'Unconvincing', 'count': 10}, {'id': 26, 'name': 'Obnoxious', 'count': 11}, {'id': 9, 'name': 'Ingenious', 'count': 19}, {'id': 25, 'name': 'OK', 'count': 28}, {'id': 23, 'name': 'Jaw-dropping', 'count': 7}, {'id': 7, 'name': 'Funny', 'count': 5}] : 1
## [{'id': 1, 'name': 'Beautiful', 'count': 101}, {'id': 3, 'name': 'Courageous', 'count': 84}, {'id': 10, 'name': 'Inspiring', 'count': 199}, {'id': 22, 'name': 'Fascinating', 'count': 83}, {'id': 9, 'name': 'Ingenious', 'count': 61}, {'id': 8, 'name': 'Informative', 'count': 67}, {'id': 24, 'name': 'Persuasive', 'count': 13}, {'id': 11, 'name': 'Longwinded', 'count': 2}, {'id': 25, 'name': 'OK', 'count': 10}, {'id': 7, 'name': 'Funny', 'count': 2}, {'id': 21, 'name': 'Unconvincing', 'count': 2}, {'id': 23, 'name': 'Jaw-dropping', 'count': 23}, {'id': 2, 'name': 'Confusing', 'count': 4}, {'id': 26, 'name': 'Obnoxious', 'count': 1}] : 1
## [{'id': 1, 'name': 'Beautiful', 'count': 101}, {'id': 7, 'name': 'Funny', 'count': 306}, {'id': 9, 'name': 'Ingenious', 'count': 106}, {'id': 11, 'name': 'Longwinded', 'count': 54}, {'id': 3, 'name': 'Courageous', 'count': 39}, {'id': 10, 'name': 'Inspiring', 'count': 230}, {'id': 8, 'name': 'Informative', 'count': 360}, {'id': 21, 'name': 'Unconvincing', 'count': 184}, {'id': 22, 'name': 'Fascinating', 'count': 298}, {'id': 2, 'name': 'Confusing', 'count': 34}, {'id': 25, 'name': 'OK', 'count': 66}, {'id': 23, 'name': 'Jaw-dropping', 'count': 54}, {'id': 26, 'name': 'Obnoxious', 'count': 47}, {'id': 24, 'name': 'Persuasive', 'count': 29}]: 1
## [{'id': 1, 'name': 'Beautiful', 'count': 102}, {'id': 3, 'name': 'Courageous', 'count': 103}, {'id': 23, 'name': 'Jaw-dropping', 'count': 40}, {'id': 10, 'name': 'Inspiring', 'count': 236}, {'id': 11, 'name': 'Longwinded', 'count': 18}, {'id': 25, 'name': 'OK', 'count': 22}, {'id': 21, 'name': 'Unconvincing', 'count': 18}, {'id': 26, 'name': 'Obnoxious', 'count': 12}, {'id': 22, 'name': 'Fascinating', 'count': 69}, {'id': 24, 'name': 'Persuasive', 'count': 64}, {'id': 9, 'name': 'Ingenious', 'count': 2}, {'id': 8, 'name': 'Informative', 'count': 80}, {'id': 2, 'name': 'Confusing', 'count': 8}, {'id': 7, 'name': 'Funny', 'count': 16}] : 1
## [{'id': 1, 'name': 'Beautiful', 'count': 102}, {'id': 8, 'name': 'Informative', 'count': 191}, {'id': 24, 'name': 'Persuasive', 'count': 124}, {'id': 3, 'name': 'Courageous', 'count': 87}, {'id': 10, 'name': 'Inspiring', 'count': 107}, {'id': 22, 'name': 'Fascinating', 'count': 109}, {'id': 9, 'name': 'Ingenious', 'count': 58}, {'id': 26, 'name': 'Obnoxious', 'count': 8}, {'id': 2, 'name': 'Confusing', 'count': 2}, {'id': 25, 'name': 'OK', 'count': 17}, {'id': 7, 'name': 'Funny', 'count': 16}, {'id': 23, 'name': 'Jaw-dropping', 'count': 14}, {'id': 11, 'name': 'Longwinded', 'count': 4}, {'id': 21, 'name': 'Unconvincing', 'count': 4}] : 1
## (Other) :2544
## related_talks
## [{'id': 1, 'hero': 'https://pe.tedcdn.com/images/ted/a2194b4ef5170bd9e9b086c5053e09cdf3545960_2880x1620.jpg', 'speaker': 'Al Gore', 'title': 'Averting the climate crisis', 'duration': 977, 'slug': 'al_gore_on_averting_climate_crisis', 'viewed_count': 3200590}, {'id': 346, 'hero': 'https://pe.tedcdn.com/images/ted/53153_480x360.jpg', 'speaker': 'Brewster Kahle', 'title': 'A free digital library', 'duration': 1206, 'slug': 'brewster_kahle_builds_a_free_digital_library', 'viewed_count': 445784}, {'id': 1410, 'hero': 'https://pe.tedcdn.com/images/ted/ec8292db13815de32529bf4568ce76b48a089bc5_2880x1620.jpg', 'speaker': 'Chip Kidd', 'title': 'Designing books is no laughing matter. OK, it is.', 'duration': 1036, 'slug': 'chip_kidd_designing_books_is_no_laughing_matter_ok_it_is', 'viewed_count': 1752006}, {'id': 2180, 'hero': 'https://pe.tedcdn.com/images/ted/2ff057a4bb41b2246c65370cf104a8ffa0dfc75a_2880x1620.jpg', 'speaker': 'Brian Dettmer', 'title': 'Old books reborn as art', 'duration': 366, 'slug': 'brian_dettmer_old_books_reborn_as_intricate_art', 'viewed_count': 1159918}, {'id': 2382, 'hero': 'https://pe.tedcdn.com/images/ted/f0492898d0b7658a39ac9348f555f11e936b5068_2880x1620.jpg', 'speaker': 'Ann Morgan', 'title': 'My year reading a book from every country in the world', 'duration': 723, 'slug': 'ann_morgan_my_year_reading_a_book_from_every_country_in_the_world', 'viewed_count': 1446231}, {'id': 1366, 'hero': 'https://pe.tedcdn.com/images/ted/7e5896a6ead7ad1d0f09c608faf7fa1b090a8dff_800x600.jpg', 'speaker': 'Shilo Shiv Suleman', 'title': 'Using tech to enable dreaming', 'duration': 456, 'slug': 'shilo_shiv_suleman_using_tech_to_enable_dreaming', 'viewed_count': 576917}] : 1
## [{'id': 1, 'hero': 'https://pe.tedcdn.com/images/ted/a2194b4ef5170bd9e9b086c5053e09cdf3545960_2880x1620.jpg', 'speaker': 'Al Gore', 'title': 'Averting the climate crisis', 'duration': 977, 'slug': 'al_gore_on_averting_climate_crisis', 'viewed_count': 3200659}, {'id': 628, 'hero': 'https://pe.tedcdn.com/images/ted/113028_800x600.jpg', 'speaker': 'James Balog', 'title': 'Time-lapse proof of extreme ice loss', 'duration': 1162, 'slug': 'james_balog_time_lapse_proof_of_extreme_ice_loss', 'viewed_count': 914178}, {'id': 1380, 'hero': 'https://pe.tedcdn.com/images/ted/3d552c9054e770953216a6b5aab1b748b3e5f823_2880x1620.jpg', 'speaker': 'James Hansen', 'title': 'Why I must speak out about climate change', 'duration': 1071, 'slug': 'james_hansen_why_i_must_speak_out_about_climate_change', 'viewed_count': 1243947}, {'id': 2816, 'hero': 'https://pe.tedcdn.com/images/ted/07284d43b7d248835825748dc24c63e4b83478c9_2880x1620.jpg', 'speaker': 'Kate Marvel', 'title': 'Can clouds buy us more time to solve climate change?', 'duration': 787, 'slug': 'kate_marvel_can_clouds_buy_us_more_time_to_solve_climate_change', 'viewed_count': 907550}, {'id': 192, 'hero': 'https://pe.tedcdn.com/images/ted/2412c9764c6b9bed2d9bd55ec7a6b9e540297839_1600x1200.jpg', 'speaker': 'David Keith', 'title': 'A critical look at geoengineering against climate change', 'duration': 958, 'slug': 'david_keith_s_surprising_ideas_on_climate_change', 'viewed_count': 876780}, {'id': 945, 'hero': 'https://pe.tedcdn.com/images/ted/195010_800x600.jpg', 'speaker': 'Johan Rockstrom', 'title': 'Let the environment guide our development', 'duration': 1090, 'slug': 'johan_rockstrom_let_the_environment_guide_our_development', 'viewed_count': 924840}]: 1
## [{'id': 10, 'hero': 'https://pe.tedcdn.com/images/ted/671d08b846a625bc2fdee1b6e3c7326bd9c3c48c_1600x1200.jpg', 'speaker': 'Dean Ornish', 'title': "The killer American diet that's sweeping the planet", 'duration': 198, 'slug': 'dean_ornish_on_the_world_s_killer_diet', 'viewed_count': 2299281}, {'id': 263, 'hero': 'https://pe.tedcdn.com/images/ted/e53dd444ed289f36275590ba3c93ddc161cf83fa_1600x1200.jpg', 'speaker': 'Mark Bittman', 'title': "What's wrong with what we eat", 'duration': 1208, 'slug': 'mark_bittman_on_what_s_wrong_with_what_we_eat', 'viewed_count': 3830694}, {'id': 348, 'hero': 'https://pe.tedcdn.com/images/ted/aaeea21df3e61b5ecb347f5b024e519b178547ef_1600x1200.jpg', 'speaker': 'Ann Cooper', 'title': "What's wrong with school lunches", 'duration': 1182, 'slug': 'ann_cooper_talks_school_lunches', 'viewed_count': 1137781}, {'id': 2659, 'hero': 'https://pe.tedcdn.com/images/ted/500c56050b598bbd5837bc0d378a5ba6301e71c0_2880x1620.jpg', 'speaker': 'Sam Kass', 'title': 'Want kids to learn well? Feed them well', 'duration': 728, 'slug': 'sam_kass_want_to_teach_kids_well_feed_them_well', 'viewed_count': 1345260}, {'id': 910, 'hero': 'https://pe.tedcdn.com/images/ted/181989_800x600.jpg', 'speaker': 'Ellen Gustafson', 'title': 'Obesity + hunger = 1 global food issue', 'duration': 675, 'slug': 'ellen_gustafson_obesity_hunger_1_global_food_issue', 'viewed_count': 664485}, {'id': 1199, 'hero': 'https://pe.tedcdn.com/images/ted/e975f00df4827c6fbfc27ad3ffa820f1c609ce25_800x600.jpg', 'speaker': 'Josette Sheeran', 'title': 'Ending hunger now', 'duration': 1150, 'slug': 'josette_sheeran_ending_hunger_now', 'viewed_count': 812506}] : 1
## [{'id': 1000, 'hero': 'https://pe.tedcdn.com/images/ted/51f652b9ff6854867d1d7abb2683caf1d8dd22fb_800x600.jpg', 'speaker': 'Gero Miesenboeck', 'title': 'Re-engineering the brain', 'duration': 1054, 'slug': 'gero_miesenboeck', 'viewed_count': 611083}, {'id': 724, 'hero': 'https://pe.tedcdn.com/images/ted/138921_800x600.jpg', 'speaker': 'Vilayanur Ramachandran', 'title': 'The neurons that shaped civilization', 'duration': 463, 'slug': 'vs_ramachandran_the_neurons_that_shaped_civilization', 'viewed_count': 1939394}, {'id': 659, 'hero': 'https://pe.tedcdn.com/images/ted/121608_800x600.jpg', 'speaker': 'Henry Markram', 'title': 'A brain in a supercomputer', 'duration': 890, 'slug': 'henry_markram_supercomputing_the_brain_s_secrets', 'viewed_count': 1180135}, {'id': 1450, 'hero': 'https://pe.tedcdn.com/images/ted/9d94c91442c28bfda8fc51070e63397867e1f787_800x600.jpg', 'speaker': 'Carl Schoonover', 'title': 'How to look inside the brain', 'duration': 292, 'slug': 'carl_schoonover_how_to_look_inside_the_brain', 'viewed_count': 890280}, {'id': 1349, 'hero': 'https://pe.tedcdn.com/images/ted/b6441a3ec7fe888555c728e011b6bd3650544fb8_800x600.jpg', 'speaker': 'Neil Burgess', 'title': 'How your brain tells you where you are', 'duration': 543, 'slug': 'neil_burgess_how_your_brain_tells_you_where_you_are', 'viewed_count': 1196431}, {'id': 2429, 'hero': 'https://pe.tedcdn.com/images/ted/0fc9a0c0e51919aa2f0efe825dbea9d4480baaf7_2880x1620.jpg', 'speaker': 'Jocelyne Bloch', 'title': 'The brain may be able to repair itself -- with help', 'duration': 694, 'slug': 'jocelyne_bloch_the_brain_may_be_able_to_repair_itself_with_help', 'viewed_count': 2152829}] : 1
## [{'id': 1000, 'hero': 'https://pe.tedcdn.com/images/ted/51f652b9ff6854867d1d7abb2683caf1d8dd22fb_800x600.jpg', 'speaker': 'Gero Miesenboeck', 'title': 'Re-engineering the brain', 'duration': 1054, 'slug': 'gero_miesenboeck', 'viewed_count': 611085}, {'id': 1674, 'hero': 'https://pe.tedcdn.com/images/ted/358f8320e4a3b2a9f21dc6ae0585b8ddc071c1a5_1600x1200.jpg', 'speaker': 'Michael Dickinson', 'title': 'How a fly flies', 'duration': 955, 'slug': 'michael_dickinson_how_a_fly_flies', 'viewed_count': 1555723}, {'id': 1879, 'hero': 'https://pe.tedcdn.com/images/ted/f8443ec450dd71d590ea1b0f7b9470dd5665b15b_2880x1620.jpg', 'speaker': 'Suzana Herculano-Houzel', 'title': 'What is so special about the human brain?', 'duration': 811, 'slug': 'suzana_herculano_houzel_what_is_so_special_about_the_human_brain', 'viewed_count': 2454843}, {'id': 967, 'hero': 'https://pe.tedcdn.com/images/ted/202580_800x600.jpg', 'speaker': 'Sebastian Seung', 'title': 'I am my connectome', 'duration': 1165, 'slug': 'sebastian_seung', 'viewed_count': 942938}, {'id': 2557, 'hero': 'https://pe.tedcdn.com/images/ted/eec0be6feefef0950540ec844664ed485ec83d8d_2880x1620.jpg', 'speaker': 'Ed Boyden', 'title': "A new way to study the brain's invisible secrets", 'duration': 795, 'slug': 'ed_boyden_baby_diapers_inspired_this_new_way_to_study_the_brain', 'viewed_count': 1260062}, {'id': 1714, 'hero': 'https://pe.tedcdn.com/images/ted/1733fa238431c2ae65c5410920b2413c2c4b8171_1600x1200.jpg', 'speaker': 'Thomas Insel', 'title': 'Toward a new understanding of mental illness', 'duration': 783, 'slug': 'thomas_insel_toward_a_new_understanding_of_mental_illness', 'viewed_count': 1188149}] : 1
## [{'id': 1001, 'hero': 'https://pe.tedcdn.com/images/ted/6f9f510c9443fa223d292d43a3e9999d8b1347cc_2880x1620.jpg', 'speaker': 'Andrew Bird', 'title': 'A one-man orchestra of the imagination', 'duration': 1159, 'slug': 'andrew_bird_s_one_man_orchestra_of_the_imagination', 'viewed_count': 1231425}, {'id': 286, 'hero': 'https://pe.tedcdn.com/images/ted/db471b06b2f5f6ba97c7de8b232878ffe9718600_1600x1200.jpg', 'speaker': 'Benjamin Zander', 'title': 'The transformative power of classical music', 'duration': 1243, 'slug': 'benjamin_zander_on_music_and_passion', 'viewed_count': 9315564}, {'id': 745, 'hero': 'https://pe.tedcdn.com/images/ted/143427_800x600.jpg', 'speaker': 'Sivamani', 'title': 'Rhythm is everything, everywhere', 'duration': 1000, 'slug': 'sivamani_rhythm_is_everything_everywhere', 'viewed_count': 556163}, {'id': 99, 'hero': 'https://pe.tedcdn.com/images/ted/217_480x360.jpg', 'speaker': 'Jill Sobule', 'title': 'Global warming\\'s theme song, "Manhattan in January"', 'duration': 163, 'slug': 'jill_sobule_sings_to_al_gore', 'viewed_count': 591395}, {'id': 688, 'hero': 'https://pe.tedcdn.com/images/ted/132434_800x600.jpg', 'speaker': 'Mallika Sarabhai', 'title': 'Dance to change the world', 'duration': 1012, 'slug': 'mallika_sarabhai', 'viewed_count': 481838}, {'id': 413, 'hero': 'https://pe.tedcdn.com/images/ted/61166_800x600.jpg', 'speaker': 'David Holt', 'title': 'The joyful tradition of mountain music', 'duration': 1517, 'slug': 'david_holt_plays_mountain_music', 'viewed_count': 512589}] : 1
## (Other) :2544
## speaker_occupation
## Writer : 45
## Artist : 34
## Designer : 34
## Journalist : 33
## Entrepreneur: 31
## Architect : 30
## (Other) :2343
## tags
## ['art', 'creativity'] : 3
## ['entertainment', 'live music', 'music', 'performance'] : 3
## ['live music', 'music', 'performance'] : 3
## ['architecture', 'cities', 'design', 'infrastructure'] : 2
## ['business', 'technology'] : 2
## ['charter for compassion', 'compassion', 'global issues', 'religion']: 2
## (Other) :2535
## title
## Hidden miracles of the natural world : 1
## How CRISPR lets us edit our DNA : 1
## "(Nothing But) Flowers" with string quartet: 1
## "Awoo" : 1
## "Black Men Ski" : 1
## "Clonie" : 1
## (Other) :2544
## url
## https://www.ted.com/talks/9_11_healing_the_mothers_who_found_forgiveness_friendship\n: 1
## https://www.ted.com/talks/a_choir_as_big_as_the_internet\n : 1
## https://www.ted.com/talks/a_j_jacobs_year_of_living_biblically\n : 1
## https://www.ted.com/talks/a_robot_that_flies_like_a_bird\n : 1
## https://www.ted.com/talks/a_ted_speaker_s_worst_nightmare\n : 1
## https://www.ted.com/talks/a_whistleblower_you_haven_t_heard\n : 1
## (Other) :2544
## views
## Min. : 50443
## 1st Qu.: 755793
## Median : 1124524
## Mean : 1698297
## 3rd Qu.: 1700760
## Max. :47227110
##
melted_ted <- melt(ted)
ggplot(data = melted_ted, mapping = aes(x = value)) +
geom_histogram(bins = 30) + # 30 bins represented the distribution well from trying values between 10 and 100, and is also the minimum for normal distribution so it can show well if we'd have a normal distribution.
# tried to force non-scientific notation, but the numbers are too long to represent, so I removed it.
#scale_x_continuous(labels = function(x) format(x, scientific = FALSE)) +
facet_wrap(~variable, scales = 'free_x')
ted$date_pub = anydate(ted$published_date)
ted$month = month(ted$date_pub)
ted$year = year(ted$date_pub)
ted$day = weekdays(ted$date_pub,abbreviate = TRUE)
head(ted,3)
## comments
## 1 4553
## 2 265
## 3 124
## description
## 1 Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.
## 2 With the same humor and humanity he exuded in "An Inconvenient Truth," Al Gore spells out 15 ways that individuals can address climate change immediately, from buying a hybrid to inventing a new, hotter brand name for global warming.
## 3 New York Times columnist David Pogue takes aim at technology’s worst interface-design offenders, and provides encouraging examples of products that get it right. To funny things up, he bursts into song.
## duration event film_date languages main_speaker
## 1 1164 TED2006 1140825600 60 Ken Robinson
## 2 977 TED2006 1140825600 43 Al Gore
## 3 1286 TED2006 1140739200 26 David Pogue
## name num_speaker published_date
## 1 Ken Robinson: Do schools kill creativity? 1 1151367060
## 2 Al Gore: Averting the climate crisis 1 1151367060
## 3 David Pogue: Simplicity sells 1 1151367060
## ratings
## 1 [{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', 'count': 4573}, {'id': 9, 'name': 'Ingenious', 'count': 6073}, {'id': 3, 'name': 'Courageous', 'count': 3253}, {'id': 11, 'name': 'Longwinded', 'count': 387}, {'id': 2, 'name': 'Confusing', 'count': 242}, {'id': 8, 'name': 'Informative', 'count': 7346}, {'id': 22, 'name': 'Fascinating', 'count': 10581}, {'id': 21, 'name': 'Unconvincing', 'count': 300}, {'id': 24, 'name': 'Persuasive', 'count': 10704}, {'id': 23, 'name': 'Jaw-dropping', 'count': 4439}, {'id': 25, 'name': 'OK', 'count': 1174}, {'id': 26, 'name': 'Obnoxious', 'count': 209}, {'id': 10, 'name': 'Inspiring', 'count': 24924}]
## 2 [{'id': 7, 'name': 'Funny', 'count': 544}, {'id': 3, 'name': 'Courageous', 'count': 139}, {'id': 2, 'name': 'Confusing', 'count': 62}, {'id': 1, 'name': 'Beautiful', 'count': 58}, {'id': 21, 'name': 'Unconvincing', 'count': 258}, {'id': 11, 'name': 'Longwinded', 'count': 113}, {'id': 8, 'name': 'Informative', 'count': 443}, {'id': 10, 'name': 'Inspiring', 'count': 413}, {'id': 22, 'name': 'Fascinating', 'count': 132}, {'id': 9, 'name': 'Ingenious', 'count': 56}, {'id': 24, 'name': 'Persuasive', 'count': 268}, {'id': 23, 'name': 'Jaw-dropping', 'count': 116}, {'id': 26, 'name': 'Obnoxious', 'count': 131}, {'id': 25, 'name': 'OK', 'count': 203}]
## 3 [{'id': 7, 'name': 'Funny', 'count': 964}, {'id': 3, 'name': 'Courageous', 'count': 45}, {'id': 9, 'name': 'Ingenious', 'count': 183}, {'id': 1, 'name': 'Beautiful', 'count': 60}, {'id': 21, 'name': 'Unconvincing', 'count': 104}, {'id': 11, 'name': 'Longwinded', 'count': 78}, {'id': 8, 'name': 'Informative', 'count': 395}, {'id': 10, 'name': 'Inspiring', 'count': 230}, {'id': 22, 'name': 'Fascinating', 'count': 166}, {'id': 2, 'name': 'Confusing', 'count': 27}, {'id': 25, 'name': 'OK', 'count': 146}, {'id': 24, 'name': 'Persuasive', 'count': 230}, {'id': 23, 'name': 'Jaw-dropping', 'count': 54}, {'id': 26, 'name': 'Obnoxious', 'count': 142}]
## related_talks
## 1 [{'id': 865, 'hero': 'https://pe.tedcdn.com/images/ted/172559_800x600.jpg', 'speaker': 'Ken Robinson', 'title': 'Bring on the learning revolution!', 'duration': 1008, 'slug': 'sir_ken_robinson_bring_on_the_revolution', 'viewed_count': 7266103}, {'id': 1738, 'hero': 'https://pe.tedcdn.com/images/ted/de98b161ad1434910ff4b56c89de71af04b8b873_1600x1200.jpg', 'speaker': 'Ken Robinson', 'title': "How to escape education's death valley", 'duration': 1151, 'slug': 'ken_robinson_how_to_escape_education_s_death_valley', 'viewed_count': 6657572}, {'id': 2276, 'hero': 'https://pe.tedcdn.com/images/ted/3821f3728e0b755c7b9aea2e69cc093eca41abe1_2880x1620.jpg', 'speaker': 'Linda Cliatt-Wayman', 'title': 'How to fix a broken school? Lead fearlessly, love hard', 'duration': 1027, 'slug': 'linda_cliatt_wayman_how_to_fix_a_broken_school_lead_fearlessly_love_hard', 'viewed_count': 1617101}, {'id': 892, 'hero': 'https://pe.tedcdn.com/images/ted/e79958940573cc610ccb583619a54866c41ef303_2880x1620.jpg', 'speaker': 'Charles Leadbeater', 'title': 'Education innovation in the slums', 'duration': 1138, 'slug': 'charles_leadbeater_on_education', 'viewed_count': 772296}, {'id': 1232, 'hero': 'https://pe.tedcdn.com/images/ted/0e3e4e92d5ee8ae0e43962d447d3f790b31099b8_800x600.jpg', 'speaker': 'Geoff Mulgan', 'title': 'A short intro to the Studio School', 'duration': 376, 'slug': 'geoff_mulgan_a_short_intro_to_the_studio_school', 'viewed_count': 667971}, {'id': 2616, 'hero': 'https://pe.tedcdn.com/images/ted/71cde5a6fa6c717488fb55eff9eef939a9241761_2880x1620.jpg', 'speaker': 'Kandice Sumner', 'title': "How America's public schools keep kids in poverty", 'duration': 830, 'slug': 'kandice_sumner_how_america_s_public_schools_keep_kids_in_poverty', 'viewed_count': 1181333}]
## 2 [{'id': 243, 'hero': 'https://pe.tedcdn.com/images/ted/566c14767bd62c5ff760e483c5b16cd2753328cd_2880x1620.jpg', 'speaker': 'Al Gore', 'title': 'New thinking on the climate crisis', 'duration': 1674, 'slug': 'al_gore_s_new_thinking_on_the_climate_crisis', 'viewed_count': 1751408}, {'id': 547, 'hero': 'https://pe.tedcdn.com/images/ted/89288_800x600.jpg', 'speaker': 'Ray Anderson', 'title': 'The business logic of sustainability', 'duration': 954, 'slug': 'ray_anderson_on_the_business_logic_of_sustainability', 'viewed_count': 881833}, {'id': 2093, 'hero': 'https://pe.tedcdn.com/images/ted/146d88845861cbf768bbf8bec8b2e41f8bfc7903_2400x1800.jpg', 'speaker': 'Lord Nicholas Stern', 'title': 'The state of the climate — and what we might do about it', 'duration': 993, 'slug': 'lord_nicholas_stern_the_state_of_the_climate_and_what_we_might_do_about_it', 'viewed_count': 773779}, {'id': 2784, 'hero': 'https://pe.tedcdn.com/images/ted/e835e670a7836cf65aca2a7a644fd94398cb4b8e_2880x1620.jpg', 'speaker': 'Ted Halstead', 'title': 'A climate solution where all sides can win', 'duration': 787, 'slug': 'ted_halstead_a_climate_solution_where_all_sides_can_win', 'viewed_count': 1021315}, {'id': 2339, 'hero': 'https://pe.tedcdn.com/images/ted/f6879e1c78d4a069735bf745ecfede71ccf4a352_2880x1620.jpg', 'speaker': 'Alice Bows-Larkin', 'title': "Climate change is happening. Here's how we adapt", 'duration': 863, 'slug': 'alice_bows_larkin_we_re_too_late_to_prevent_climate_change_here_s_how_we_adapt', 'viewed_count': 1157975}, {'id': 2331, 'hero': 'https://pe.tedcdn.com/images/ted/64bbcc24af870d9b4e1768abe88e1fe041b02a6a_2880x1620.jpg', 'speaker': 'Mary Robinson', 'title': 'Why climate change is a threat to human rights', 'duration': 1307, 'slug': 'mary_robinson_why_climate_change_is_a_threat_to_human_rights', 'viewed_count': 1126293}]
## 3 [{'id': 1725, 'hero': 'https://pe.tedcdn.com/images/ted/b7f415a054cc0a2bfdd90d0ad5a7f64cf060150d_1600x1200.jpg', 'speaker': 'David Pogue', 'title': '10 top time-saving tech tips', 'duration': 344, 'slug': 'david_pogue_10_top_time_saving_tech_tips', 'viewed_count': 4843421}, {'id': 2274, 'hero': 'https://pe.tedcdn.com/images/ted/608e677e4392bcdcf82b068fa221b9df74a213ef_2880x1620.jpg', 'speaker': 'Tony Fadell', 'title': 'The first secret of design is ... noticing', 'duration': 1001, 'slug': 'tony_fadell_the_first_secret_of_design_is_noticing', 'viewed_count': 2005916}, {'id': 172, 'hero': 'https://pe.tedcdn.com/images/ted/b790be2f87ceffba73fe73837944400c7d61cba2_1600x1200.jpg', 'speaker': 'John Maeda', 'title': 'Designing for simplicity', 'duration': 959, 'slug': 'john_maeda_on_the_simple_life', 'viewed_count': 1215942}, {'id': 2664, 'hero': 'https://pe.tedcdn.com/images/ted/092f184f6625c2aeef10949c8d7b2aa14ba4132b_2880x1620.jpg', 'speaker': 'Dan Bricklin', 'title': 'Meet the inventor of the electronic spreadsheet', 'duration': 720, 'slug': 'dan_bricklin_meet_the_inventor_of_the_electronic_spreadsheet', 'viewed_count': 992322}, {'id': 436, 'hero': 'https://pe.tedcdn.com/images/ted/66438_800x600.jpg', 'speaker': 'David Carson', 'title': 'Design and discovery', 'duration': 1359, 'slug': 'david_carson_on_design', 'viewed_count': 774485}, {'id': 1546, 'hero': 'https://pe.tedcdn.com/images/ted/39d58abd56cf9edd6936f4f42119a15e65692dc1_1600x1200.jpg', 'speaker': 'Clay Shirky', 'title': 'How the Internet will (one day) transform government', 'duration': 1112, 'slug': 'clay_shirky_how_the_internet_will_one_day_transform_government', 'viewed_count': 1245084}]
## speaker_occupation
## 1 Author/educator
## 2 Climate advocate
## 3 Technology columnist
## tags
## 1 ['children', 'creativity', 'culture', 'dance', 'education', 'parenting', 'teaching']
## 2 ['alternative energy', 'cars', 'climate change', 'culture', 'environment', 'global issues', 'science', 'sustainability', 'technology']
## 3 ['computers', 'entertainment', 'interface design', 'media', 'music', 'performance', 'simplicity', 'software', 'technology']
## title
## 1 Do schools kill creativity?
## 2 Averting the climate crisis
## 3 Simplicity sells
## url
## 1 https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity\n
## 2 https://www.ted.com/talks/al_gore_on_averting_climate_crisis\n
## 3 https://www.ted.com/talks/david_pogue_says_simplicity_sells\n
## views date_pub month year day
## 1 47227110 2006-06-27 6 2006 Tue
## 2 3200520 2006-06-27 6 2006 Tue
## 3 1636292 2006-06-27 6 2006 Tue
median_duration <- median(ted$duration)
cat("Median number of duration: ", median(ted$duration))
## Median number of duration: 848
print("")
## [1] ""
cat("Mean number of duration: ", mean(ted$duration))
## Mean number of duration: 826.5102
# simple r histogram:
# hist(ted$duration)
# a nicer histogram using ggplot, also adding median number of duration line
duration_hist = ggplot(ted,aes(duration,..count..)) +
geom_histogram(fill="turquoise") +
labs(x="Duration",y="How Many Talks in that duration",title="Histogram (Distribution) of Durations of TedTalks") +
#scale_x_continuous(limits=c(0,1500),breaks=seq(0,1500,150)) +
geom_vline(aes(xintercept = median(ted$duration)),linetype=4,size=1,color="white") +
geom_vline(aes(xintercept = mean(ted$duration)),linetype=4,size=1,color="blue")
duration_hist
### Distribution of the number of views (in the discussion) per talk
median_views <- median(ted$views)
cat("Median number of views: ", median(ted$views))
## Median number of views: 1124524
cat("Mean number of views: ", mean(ted$views))
## Mean number of views: 1698297
# simple r histogram:
hist(ted$views)
# a nicer histogram using ggplot, also adding median number of views line
views_hist = ggplot(ted,aes(views,..count..)) +
geom_histogram(fill="tomato") +
labs(x="views",y="Count",title="Histogram (Distribution) of Number Of views", scientific = FALSE) +
scale_x_continuous(limits=c(0,10000000)) + #,breaks=seq(0,10000000,10000),labels = function(x) format(x, scientific = FALSE) ) +
#theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_vline(aes(xintercept = median(ted$views)),linetype=4,size=1,color="white") +
geom_vline(aes(xintercept = mean(ted$views)),linetype=4,size=1,color="blue")
views_hist
median_comments <- median(ted$comments)
cat("Median number of comments: ", median(ted$comments))
## Median number of comments: 118
cat("Mean number of comments: ", mean(ted$comments))
## Mean number of comments: 191.5624
# simple r histogram:
hist(ted$comments)
# a nicer histogram using ggplot, also adding median number of comments line
comments_hist = ggplot(ted,aes(comments,..count..)) +
geom_histogram(fill="navy") +
labs(x="Comments",y="Count",title="Histogram (Distribution) of Number Of Comments") +
#scale_x_continuous(limits=c(0,1500),breaks=seq(0,1500,150)) +
geom_vline(aes(xintercept = median(ted$comments)),linetype=4,size=1,color="white") +
geom_vline(aes(xintercept = mean(ted$comments)),linetype=4,size=1,color="blue")
comments_hist
The number of comments are strongly skewed left towards 0 comments. Therefore, their mean (191) is not the most representative, but median is more representative: 118.
Let’s explore occupations. Let’s first see 10 most popular occuptions
occupation_df <- data.frame(table(ted$speaker_occupation))
colnames(occupation_df) <- c("occupation", "appearances")
occupation_df <- occupation_df %>% arrange(desc(appearances))
head(occupation_df, 10)
## occupation appearances
## 1 Writer 45
## 2 Artist 34
## 3 Designer 34
## 4 Journalist 33
## 5 Entrepreneur 31
## 6 Architect 30
## 7 Inventor 27
## 8 Psychologist 26
## 9 Photographer 25
## 10 Filmmaker 21
ggplot(head(occupation_df,30),
aes(x=reorder(occupation, appearances),
y=appearances, fill=occupation)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_bar(stat="identity") +
guides(fill=FALSE)
ted_occupation_summary = ted %>% group_by(speaker_occupation) %>% summarise(meanviews=mean(views)) %>% arrange(desc(meanviews))
ggplot(ted_occupation_summary,
aes(factor(speaker_occupation,levels=speaker_occupation), meanviews,fill=meanviews))+
geom_bar(stat="identity")+
labs(x="Occupation" ,y="Mean Views",title="Occupation Vs Views")
#scale_fill_brewer(name="Occupation",palette = "Set2")#+
#scale_y_discrete(labels=scales::comma)
Days of week Published vs Views
# group by day and count total views, using dpyler
ted_by_day = ted %>% group_by(day) %>% summarise(mean_views=mean(views)) %>% arrange(desc(mean_views))
ted_by_day$day <- factor(ted_by_day$day, levels = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"))
ggplot(data = ted_by_day, aes(x = factor(day), y = mean_views, fill = mean_views)) +
geom_bar(stat="identity") +
labs(x="Day of week published",y="Mean Views",title="Mean Views Per Day of Week")
So day of week does seem to have some association with average (mean) views! Ted Talks published on weekends seem to have much less views, with Saturday being the lowest, and Friday is the most popular day for ted talks published day. Since it seems that the major effect is “Weekend or not”, I’m creating a binary dummy variable for is it weekend or not to regress upon later.
# creating a is_weekend variable
library(chron)
ted$weekend <- as.numeric(is.weekend(ted$date_pub))
Insights and description from the distributions: Number of views is also a Poisson like distribution skewed leftwards Duration of talks is closer to a normal distribution, but with a wide right-side tail of a few talks at longer durations, around a mean of 14 minutes and median of 12 minutes. Almost all talks range between 1-18 minutes (maximum length of a normal Ted talk). Number of comments is a Poisson distribution (visually resembling an exponential distribution, but it is not technically since comments are measured at discrete numbers) strongly skewed to the minimum of 0 comments for the unpopular videos. Number of tags also is a Poisson distribution skewed leftwards, peaks between 4-7 tags. Number of languages of translations has a peak at 0 for unpopular talks, but mostly is between 20-40 languages offered. While the film dates are low before 2012, they are all published at a much more uniform rate.
First is a correlation (pairs) scatterplot matrix between each pair of numerical variables; Below is a correlation matrix with colors representing the intensity of the correlation from 0 (white) to dark blue (+1) or dark striped red (strong negative correlation, -1), and with asterisks (***) signifying significance by p-values.
colnames(ted)
## [1] "comments" "description" "duration"
## [4] "event" "film_date" "languages"
## [7] "main_speaker" "name" "num_speaker"
## [10] "published_date" "ratings" "related_talks"
## [13] "speaker_occupation" "tags" "title"
## [16] "url" "views" "date_pub"
## [19] "month" "year" "day"
## [22] "num_tags" "weekend"
col_numeric = c(17,1,3,6,10,23) # comments,duration,film_date,languages,views)
ted_numeric = ted[,col_numeric]
pairs(ted_numeric)
We see some correlations there but not many clear ones; between views, comments and languages. No clear corerlation between views and duration or published date.
cor_matrix <- cor(ted_numeric)
res <- cor.mtest(ted_numeric, conf.level = .95)
res
## $p
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.000000e+00 1.803322e-185 1.383469e-02 3.140578e-87 3.657160e-01
## [2,] 1.803322e-185 0.000000e+00 9.558285e-13 3.950633e-61 2.869772e-21
## [3,] 1.383469e-02 9.558285e-13 0.000000e+00 1.279484e-52 2.819913e-17
## [4,] 3.140578e-87 3.950633e-61 1.279484e-52 0.000000e+00 2.370426e-18
## [5,] 3.657160e-01 2.869772e-21 2.819913e-17 2.370426e-18 0.000000e+00
## [6,] 5.820279e-04 6.932328e-01 1.003110e-01 8.566019e-35 2.459542e-05
## [,6]
## [1,] 5.820279e-04
## [2,] 6.932328e-01
## [3,] 1.003110e-01
## [4,] 8.566019e-35
## [5,] 2.459542e-05
## [6,] 0.000000e+00
##
## $lowCI
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.000000000 0.50247793 0.009942831 0.3438467 -0.05669662
## [2,] 0.502477926 1.00000000 0.102436668 0.2829638 -0.22314183
## [3,] 0.009942831 0.10243667 1.000000000 -0.3307018 -0.20382475
## [4,] 0.343846739 0.28296377 -0.330701766 1.0000000 -0.20925668
## [5,] -0.056696624 -0.22314183 -0.203824754 -0.2092567 1.00000000
## [6,] -0.106608323 -0.04661768 -0.006273864 -0.2764461 -0.12185536
## [,6]
## [1,] -0.106608323
## [2,] -0.046617685
## [3,] -0.006273864
## [4,] -0.276446094
## [5,] -0.121855358
## [6,] 1.000000000
##
## $uppCI
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00000000 0.5582501 0.08739150 0.4104235 0.02091130 -0.02933472
## [2,] 0.55825006 1.0000000 0.17853504 0.3527427 -0.14818904 0.03101040
## [3,] 0.08739150 0.1785350 1.00000000 -0.2598468 -0.12833640 0.07127682
## [4,] 0.41042349 0.3527427 -0.25984683 1.0000000 -0.13391282 -0.20328629
## [5,] 0.02091130 -0.1481890 -0.12833640 -0.1339128 1.00000000 -0.04476215
## [6,] -0.02933472 0.0310104 0.07127682 -0.2032863 -0.04476215 1.00000000
corrplot::corrplot(cor_matrix,method="shade",bg="white",title="Correlation Matrix (by color) and significane (*)",
p.mat = res$p, sig.level = c(.001, .01, .05), pch.cex = .9, insig = "label_sig", pch.col = "white")
These correlation plots and correlation matrix show us the following conclusions: * There is a relatively higher positive correlation between number of comments and views, which makes sense (more audience, more comments); * Some positive correlation between number of languages of translation and number of views (0.38) and number of comments (0.32) * Small negative correlation between duration and number of languages; the shorter the talk, the more tranlsated languages there are, probably because it is easier to translate.
Most of the parameters don’t have strong correlations. Naturally, there was a very high correlation between published data and filmed date. Filmed date seemed to be less associated with views numerically and logically – since the audience is more affected by the date a ted talk is released than whether it was recorded a month ago or a year ago. There is a relatively higher positive correlation between number of comments and views, which makes sense (more audience, more comments); Some positive correlation between number of languages of translation and number of views (0.38) and number of comments (0.32) Small negative correlation between duration and number of languages; the shorter the talk, the more translated languages there are, probably because it is easier to translate.
How do these variables correlate with views? Is it a linear relationship, nonlinear, or non? This is important to understand to know how to insert them into the regression if at all.
Here I’ll start inspecting what are the correlations between vriables and the dependent variable, and what kind of relationship would suit to include in the regression between them (linear? polynomial? factor? none?)
# Views By Duraion
ggplot(ted, aes(duration, views, size = views, col = duration)) +
geom_point(alpha=0.8) +
geom_smooth(method = loess, colour="White") +
geom_smooth(method = lm, colour="Pink") +
labs(x="Duration",y="Views",title="Views By Duration") #+
# Views By Comments
ggplot(ted, aes(comments, views, size = views, col = comments)) +
geom_point(alpha=0.8) +
geom_smooth(method = loess, colour="White") +
geom_smooth(method = lm, colour="Pink") +
labs(x="Comments",y="Views",title="Views By Comments") #+
# Views By Languages
ggplot(ted, aes(languages, views, size = views, col = languages)) +
geom_point(alpha=0.8) +
geom_smooth(method = loess, colour="White") +
geom_smooth(method = lm, colour="Pink") +
labs(x="Languages",y="Views",title="Views By Languages")
# Views By Number of Tags
ggplot(ted, aes(num_tags, views, size = views, col = num_tags)) +
geom_point(alpha=0.8) +
geom_smooth(method = loess, colour="White") +
geom_smooth(method = lm, colour="Pink") +
labs(x="Languages",y="Views",title="Views By Number of Tags")
It seems that non of these loose much information by a linear regression versus a LOESS regression, which is arbitrarily flexible and would reveal a clear non-linear shape. while some of them do have nonlinear shapes - from a closer look, it is only in the tail where data is scarce and it is biasd by the few datapoints there and some outliers (as in the Comments correlation). Therefore, inputting the regressor as a linear fit might be sufficiently explanatory.
So, day of week does seem to have some association with average (mean) views! Ted Talks published on weekends seem to have much less views, with Saturday being the lowest, and Friday is the most popular day for ted talks published day. Below are scatterplots with LOESS flexible regression in white and linear regression in pink, to see how different would a linear shape look from a flexible moving average. This shows us that usually, except for in the tales of the distributions of these variables, where there are only a couple of outliers’ data, the linear model described the relationship somewhat well.
Surprisingly, duration had almost no consistent correlation with number of views; except for the fact that most popular talks were closer to 8-20 minutes. Number of comments is, obviously, very well correlated with number of views and so does number of languages – they all come from having many viewers. Thus, it is not “fair” to predict views based on these factors, and in the real world, we couldn’t use these parameters to predict, since they are not causes for more views, but they are also a result of many views, and a cause in a reinforcing feedback loop: the more comments, the more engaged the community is around the talk and likelier to spread; the more languages, the more viewers can watch; and the more viewers, the more audience there is to comment and translate. The rest had a small linear effect, where that didn’t deviate much, though small.
I inspected how does each of hte relevant variables explain the view count individually, and how does then adding (the relevant) one to a multiple regression improves its performance.
fit1 <- lm(views ~ comments, data = ted)
summary(fit1)
##
## Call:
## lm(formula = views ~ comments, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26514433 -667711 -254429 229873 31596994
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 798186.6 50681.6 15.75 <2e-16 ***
## comments 4698.8 148.6 31.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2118000 on 2548 degrees of freedom
## Multiple R-squared: 0.2819, Adjusted R-squared: 0.2816
## F-statistic: 1000 on 1 and 2548 DF, p-value: < 2.2e-16
fit2 <- lm(views ~ languages, data = ted)
summary(fit2)
##
## Call:
## lm(formula = views ~ languages, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4235090 -887422 -418475 287948 42305382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -997579 138744 -7.19 8.47e-13 ***
## languages 98655 4792 20.59 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2314000 on 2548 degrees of freedom
## Multiple R-squared: 0.1426, Adjusted R-squared: 0.1423
## F-statistic: 423.8 on 1 and 2548 DF, p-value: < 2.2e-16
fit3 <- lm(views ~ duration, data = ted)
summary(fit3)
##
## Call:
## lm(formula = views ~ duration, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2667314 -932675 -554748 28483 45418926
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1429186.7 119912.2 11.919 <2e-16 ***
## duration 325.6 132.2 2.463 0.0138 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2496000 on 2548 degrees of freedom
## Multiple R-squared: 0.002376, Adjusted R-squared: 0.001984
## F-statistic: 6.068 on 1 and 2548 DF, p-value: 0.01383
# Adding date related information using just a weekday binary
fit4 <- lm(views ~ ted$weekend, data=ted)
summary(fit4)
##
## Call:
## lm(formula = views ~ ted$weekend, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1683960 -934543 -576696 5606 45492707
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1734403 50473 34.363 < 2e-16 ***
## ted$weekend -836986 243014 -3.444 0.000582 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2493000 on 2548 degrees of freedom
## Multiple R-squared: 0.004634, Adjusted R-squared: 0.004243
## F-statistic: 11.86 on 1 and 2548 DF, p-value: 0.000582
# weekend was significant, with a large slope (-836986) but didn't explain data well alone.
fit5 <- lm(views ~ num_tags, data = ted)
summary(fit5)
##
## Call:
## lm(formula = views ~ num_tags, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1710677 -959176 -550980 24372 45516020
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1886196 99359 18.98 <2e-16 ***
## num_tags -25015 11474 -2.18 0.0293 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2497000 on 2548 degrees of freedom
## Multiple R-squared: 0.001862, Adjusted R-squared: 0.00147
## F-statistic: 4.753 on 1 and 2548 DF, p-value: 0.02933
The comments and languages where significantly and (relatively highly) positively correlated with views. Day of week seeemed to have some correlation - weekend days reduced the views. I therefore just regress on a binary variable weekend or weekday, since the nunmber of the day of the week didn’t have a clear correlation.
# Excluded anlaysis
summary(lm(views ~ event, data = ted))
##
## Call:
## lm(formula = views ~ event, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15027899 -850943 -300075 86910 43952765
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 149818 2447735 0.061
## eventArbejdsglaede Live 821776 3461619 0.237
## eventBBC TV 372156 3461619 0.108
## eventBowery Poetry Club 526923 3461619 0.152
## eventBusiness Innovation Factory 154268 2826400 0.055
## eventCarnegie Mellon University 414963 3461619 0.120
## eventChautauqua Institution 111051 2826400 0.039
## eventDICE Summit 2010 299343 3461619 0.086
## eventDLD 2007 613997 3461619 0.177
## eventEG 2007 887449 2540134 0.349
## eventEG 2008 1898441 2596215 0.731
## eventElizabeth G. Anderson School 876771 3461619 0.253
## eventEric Whitacre's Virtual Choir 249462 3461619 0.072
## eventFort Worth City Council 128854 3461619 0.037
## eventFull Spectrum Auditions 1022540 2826400 0.362
## eventGel Conference 506574 2997850 0.169
## eventGlobal Witness HQ 865865 3461619 0.250
## eventHandheld Learning 101495 3461619 0.029
## eventHarvard University 1256333 3461619 0.363
## eventINK Conference 1493142 2643856 0.565
## eventJustice with Michael Sandel 243641 3461619 0.070
## eventLIFT 2007 1337035 3461619 0.386
## eventMichael Howard Studios 27177 3461619 0.008
## eventMission Blue II 1179654 2596215 0.454
## eventMission Blue Voyage 385465 2514808 0.153
## eventNew York State Senate 142577 3461619 0.041
## eventNextGen:Charity 143808 3461619 0.042
## eventPrinceton University 509846 3461619 0.147
## eventRoyal Institution 168605 3461619 0.049
## eventRSA Animate 865947 2826400 0.306
## eventSerious Play 2008 776122 2596215 0.299
## eventSkoll World Forum 2007 268416 3461619 0.078
## eventSoulPancake 678385 3461619 0.196
## eventStanford University 4508818 2997850 1.504
## eventTaste3 2008 887956 2643856 0.336
## eventTED Dialogues 1052566 2997850 0.351
## eventTED Fellows 2015 1146696 3461619 0.331
## eventTED Fellows Retreat 2013 741501 2681359 0.277
## eventTED Fellows Retreat 2015 1608955 2616738 0.615
## eventTED in the Field 506328 2736650 0.185
## eventTED Prize Wish 271394 3461619 0.078
## eventTED Residency 817780 2681359 0.305
## eventTED Senior Fellows at TEDGlobal 2010 167309 3461619 0.048
## eventTED Studio 1261637 2736650 0.461
## eventTED Talks Education 4225286 2596215 1.627
## eventTED Talks Live 1346068 2508182 0.537
## eventTED-Ed 233905 2580139 0.091
## eventTED-Ed Weekend 908124 3461619 0.262
## eventTED@Bangalore 965404 3461619 0.279
## eventTED@BCG Berlin 1676768 2997850 0.559
## eventTED@BCG London 2279763 2736650 0.833
## eventTED@BCG Paris 1319829 2556575 0.516
## eventTED@BCG San Francisco 1661422 2616738 0.635
## eventTED@BCG Singapore 1120376 2826400 0.396
## eventTED@Cannes 1249062 2681359 0.466
## eventTED@IBM 1427470 2567206 0.556
## eventTED@Intel 771995 2826400 0.273
## eventTED@Johannesburg 1444677 3461619 0.417
## eventTED@London 973968 3461619 0.281
## eventTED@MotorCity 567778 2997850 0.189
## eventTED@Nairobi 777130 3461619 0.224
## eventTED@New York 1499272 2736650 0.548
## eventTED@NYC 1577687 2567206 0.615
## eventTED@State 777748 2681359 0.290
## eventTED@State Street Boston 1222452 2643856 0.462
## eventTED@State Street London 2497339 3461619 0.721
## eventTED@StateStreet Boston 1527552 3461619 0.441
## eventTED@SXSWi 505588 3461619 0.146
## eventTED@Unilever 1643349 2826400 0.581
## eventTED@UPS 1648480 2681359 0.615
## eventTED1984 824269 3461619 0.238
## eventTED1990 470988 3461619 0.136
## eventTED1994 431601 3461619 0.125
## eventTED1998 601068 2643856 0.227
## eventTED2001 1709131 2826400 0.605
## eventTED2002 801657 2491061 0.322
## eventTED2003 961384 2483470 0.387
## eventTED2004 2543826 2486901 1.023
## eventTED2005 1636369 2480592 0.660
## eventTED2006 3124527 2474782 1.263
## eventTED2007 1361313 2465667 0.552
## eventTED2008 1888829 2469113 0.765
## eventTED2009 1605078 2462436 0.652
## eventTED2010 1648253 2465667 0.668
## eventTED2011 1818046 2465156 0.737
## eventTED2012 2073222 2466491 0.841
## eventTED2013 2152882 2463578 0.874
## eventTED2014 1923056 2462261 0.781
## eventTED2015 1861199 2463999 0.755
## eventTED2016 1662804 2463578 0.675
## eventTED2017 1038440 2465934 0.421
## eventTEDActive 2011 1106268 2826400 0.391
## eventTEDActive 2014 1317743 2997850 0.440
## eventTEDActive 2015 1381244 3461619 0.399
## eventTEDCity2.0 967738 2556575 0.379
## eventTEDGlobal 2005 1489245 2494362 0.597
## eventTEDGlobal 2007 412332 2492651 0.165
## eventTEDGlobal 2009 1529203 2466491 0.620
## eventTEDGlobal 2010 1189092 2469886 0.481
## eventTEDGlobal 2011 1567943 2465667 0.636
## eventTEDGlobal 2012 1922618 2465156 0.780
## eventTEDGlobal 2013 2434345 2466208 0.987
## eventTEDGlobal 2014 1166348 2471616 0.472
## eventTEDGlobal 2017 406263 2826400 0.144
## eventTEDGlobal>Geneva 3235590 2556575 1.266
## eventTEDGlobal>London 1914689 2540134 0.754
## eventTEDGlobalLondon 2390457 2596215 0.921
## eventTEDIndia 2009 1209267 2482456 0.487
## eventTEDLagos Ideas Search 1098726 2997850 0.367
## eventTEDMED 2009 2096326 2556575 0.820
## eventTEDMED 2010 326495 2736650 0.119
## eventTEDMED 2011 622232 2616738 0.238
## eventTEDMED 2012 1202115 2567206 0.468
## eventTEDMED 2013 1532082 2596215 0.590
## eventTEDMED 2014 1594178 2567206 0.621
## eventTEDMED 2015 1903529 2596215 0.733
## eventTEDMED 2016 1091313 2567206 0.425
## eventTEDNairobi Ideas Search 547029 3461619 0.158
## eventTEDNYC 1092920 2511323 0.435
## eventTEDPrize@UN 643944 2997850 0.215
## eventTEDSalon 2006 947072 2736650 0.346
## eventTEDSalon 2007 Hot Science 737998 2997850 0.246
## eventTEDSalon 2009 Compassion 131998 2826400 0.047
## eventTEDSalon Berlin 2014 1554534 2567206 0.606
## eventTEDSalon London 2009 806357 3461619 0.233
## eventTEDSalon London 2010 1235348 2616738 0.472
## eventTEDSalon London Fall 2011 2079182 2997850 0.694
## eventTEDSalon London Fall 2012 2224279 2643856 0.841
## eventTEDSalon London Spring 2011 902113 2643856 0.341
## eventTEDSalon London Spring 2012 937842 2681359 0.350
## eventTEDSalon NY2011 1376771 2643856 0.521
## eventTEDSalon NY2012 802688 2736650 0.293
## eventTEDSalon NY2013 2711822 2556575 1.061
## eventTEDSalon NY2014 1935574 2556575 0.757
## eventTEDSalon NY2015 1561231 3461619 0.451
## eventTEDSummit 1380145 2483470 0.556
## eventTEDWomen 2010 1148268 2483470 0.462
## eventTEDWomen 2013 2355045 2580139 0.913
## eventTEDWomen 2015 1300084 2491061 0.522
## eventTEDWomen 2016 1190539 2496209 0.477
## eventTEDxABQ 148958 3461619 0.043
## eventTEDxAmazonia 835155 3461619 0.241
## eventTEDxAmericanRiviera 4234679 3461619 1.223
## eventTEDxAmoskeagMillyard 910036 3461619 0.263
## eventTEDxAmsterdam 680676 2736650 0.249
## eventTEDxArendal 1915378 3461619 0.553
## eventTEDxAsheville 268833 3461619 0.078
## eventTEDxAthens 1654373 2997850 0.552
## eventTEDxAtlanta 518636 3461619 0.150
## eventTEDxAustin 1028499 2736650 0.376
## eventTEDxBeaconStreet 2057677 2502747 0.822
## eventTEDxBeirut 1083286 3461619 0.313
## eventTEDxBend 2831865 2997850 0.945
## eventTEDxBerkeley 616554 2997850 0.206
## eventTEDxBerlin 866083 2826400 0.306
## eventTEDxBG 452385 3461619 0.131
## eventTEDxBinghamtonUniversity 4190299 3461619 1.211
## eventTEDxBloomington 9334442 2997850 3.114
## eventTEDxBoston 1036808 2826400 0.367
## eventTEDxBoston 2009 347009 3461619 0.100
## eventTEDxBoston 2010 666201 3461619 0.192
## eventTEDxBoston 2011 512720 2643856 0.194
## eventTEDxBoston 2012 772035 2643856 0.292
## eventTEDxBoulder 1649480 2826400 0.584
## eventTEDxBoulder 2011 931911 2997850 0.311
## eventTEDxBratislava 1541938 2997850 0.514
## eventTEDxBrighton 727288 3461619 0.210
## eventTEDxBrussels 992203 2596215 0.382
## eventTEDxCaFoscariU 1037287 3461619 0.300
## eventTEDxCaltech 1099048 2596215 0.423
## eventTEDxCambridge 1458830 2596215 0.562
## eventTEDxCanberra 453356 2826400 0.160
## eventTEDxCannes 612223 3461619 0.177
## eventTEDxCERN 1186139 2616738 0.453
## eventTEDxChange 882572 2736650 0.323
## eventTEDxChapmanU 2948658 3461619 0.852
## eventTEDxCHUV 4532640 3461619 1.309
## eventTEDxClaremontColleges 1195626 3461619 0.345
## eventTEDxCMU 1323306 2997850 0.441
## eventTEDxColbyCollege 1273819 3461619 0.368
## eventTEDxColoradoSprings 876199 3461619 0.253
## eventTEDxColumbus 978040 2997850 0.326
## eventTEDxColumbusWomen 1882933 3461619 0.544
## eventTEDxConcorde 1044078 3461619 0.302
## eventTEDxConcordiaUPortland 3120541 3461619 0.901
## eventTEDxCreativeCoast 8295163 3461619 2.396
## eventTEDxCrenshaw 594975 3461619 0.172
## eventTEDxDanubia 1387572 3461619 0.401
## eventTEDxDeExtinction 601616 2997850 0.201
## eventTEDxDelft 769486 2997850 0.257
## eventTEDxDesMoines 1130007 3461619 0.326
## eventTEDxDirigo 41737 3461619 0.012
## eventTEDxDU 2010 750288 2997850 0.250
## eventTEDxDU 2011 219545 3461619 0.063
## eventTEDxDubai 1474923 3461619 0.426
## eventTEDxDublin 387551 2826400 0.137
## eventTEDxEast 579030 2681359 0.216
## eventTEDxEastEnd 1661284 3461619 0.480
## eventTEDxEdmonton 1429215 3461619 0.413
## eventTEDxEQChCh 2679666 3461619 0.774
## eventTEDxEuston 1168636 3461619 0.338
## eventTEDxExeter 1088551 2580139 0.422
## eventTEDxFiDiWomen 1475958 3461619 0.426
## eventTEDxFrankfurt 758848 3461619 0.219
## eventTEDxFulbrightDublin 902834 3461619 0.261
## eventTEDxGatewayWomen 1278265 3461619 0.369
## eventTEDxGeorgetown 510751 3461619 0.148
## eventTEDxGhent 696160 2997850 0.232
## eventTEDxGlasgow 892466 3461619 0.258
## eventTEDxGoldenGatePark 2012 4661362 3461619 1.347
## eventTEDxGoodenoughCollege 329060 3461619 0.095
## eventTEDxGöteborg 2010 313548 3461619 0.091
## eventTEDxGrandRapids 1316277 3461619 0.380
## eventTEDxGreatPacificGarbagePatch 295753 2826400 0.105
## eventTEDxGroningen 1303424 3461619 0.377
## eventTEDxHamburg 389389 3461619 0.112
## eventTEDxHampshireCollege 729858 3461619 0.211
## eventTEDxHelvetia 1045120 3461619 0.302
## eventTEDxHogeschoolUtrecht 488429 3461619 0.141
## eventTEDxHousesOfParliament 746927 2643856 0.283
## eventTEDxHouston 15990432 2997850 5.334
## eventTEDxImperialCollege 1035115 3461619 0.299
## eventTEDxIndianapolis 3211287 3461619 0.928
## eventTEDxIndianaUniversity 984707 3461619 0.284
## eventTEDxIslay -29544 3461619 -0.009
## eventTEDxJaffa 2012 1883979 3461619 0.544
## eventTEDxJaffa 2013 2781548 3461619 0.804
## eventTEDxKC 1170502 2736650 0.428
## eventTEDxKids@Ambleside 2804218 3461619 0.810
## eventTEDxKids@Brussels 288561 3461619 0.083
## eventTEDxKrakow 650600 2997850 0.217
## eventTEDxKyoto 3048519 2997850 1.017
## eventTEDxLeuvenSalon 1040006 3461619 0.300
## eventTEDxLinnaeusUniversity 4835066 3461619 1.397
## eventTEDxLondonBusinessSchool 731543 3461619 0.211
## eventTEDxMaastricht 339533 2736650 0.124
## eventTEDxManchester 626823 2997850 0.209
## eventTEDxManhattan 1399935 2997850 0.467
## eventTEDxManhattanBeach 3101886 2826400 1.097
## eventTEDxMarin 1673864 2826400 0.592
## eventTEDxMaui 1654884 2997850 0.552
## eventTEDxMet 4057808 2997850 1.354
## eventTEDxMIA 199174 3461619 0.058
## eventTEDxMiamiUniversity 257601 3461619 0.074
## eventTEDxMidAtlantic 2155786 2518698 0.856
## eventTEDxMidAtlantic 2013 1447978 2997850 0.483
## eventTEDxMidwest 2330102 2826400 0.824
## eventTEDxMileHigh 736408 2736650 0.269
## eventTEDxMonroeCorrectionalComplex 663145 3461619 0.192
## eventTEDxMonterey 17017 3461619 0.005
## eventTEDxMontreal 421191 3461619 0.122
## eventTEDxMtHood 3314949 3461619 0.958
## eventTEDxMuncyStatePrison 1160260 3461619 0.335
## eventTEDxNASA 1122486 2997850 0.374
## eventTEDxNASA@SiliconValley 6077 3461619 0.002
## eventTEDxNatick 733022 3461619 0.212
## eventTEDxNewy 723412 3461619 0.209
## eventTEDxNewYork 1645449 2643856 0.622
## eventTEDxNextGenerationAsheville 2002916 3461619 0.579
## eventTEDxNijmegen 640773 3461619 0.185
## eventTEDxNorrkoping 6419675 3461619 1.855
## eventTEDxNorthwesternU 921314 3461619 0.266
## eventTEDxNYED 1277752 2997850 0.426
## eventTEDxO'Porto 141433 3461619 0.041
## eventTEDxObserver 12709 2997850 0.004
## eventTEDxOilSpill 272667 2826400 0.096
## eventTEDxOmaha 1464706 3461619 0.423
## eventTEDxOrangeCoast 1474366 3461619 0.426
## eventTEDxOrcasIsland 827772 3461619 0.239
## eventTEDxOslo 1503846 2997850 0.502
## eventTEDxParis 2010 382504 2997850 0.128
## eventTEDxParis 2012 3691032 3461619 1.066
## eventTEDxPeachtree 1642753 2826400 0.581
## eventTEDxPenn 1199435 3461619 0.346
## eventTEDxPennQuarter 2285024 3461619 0.660
## eventTEDxPennsylvaniaAvenue 598667 3461619 0.173
## eventTEDxPerth 1621258 2997850 0.541
## eventTEDxPhoenix 134868 2736650 0.049
## eventTEDxPittsburgh 58297 3461619 0.017
## eventTEDxPlaceDesNations 860569 2997850 0.287
## eventTEDxPortland 2609233 3461619 0.754
## eventTEDxPortofSpain 333713 2997850 0.111
## eventTEDxProvidence 1264903 3461619 0.365
## eventTEDxPSU 1233463 2616738 0.471
## eventTEDxPuget Sound 34159614 3461619 9.868
## eventTEDxRainier 1753792 2681359 0.654
## eventTEDxRC2 346408 2826400 0.123
## eventTEDxRiodelaPlata 1100030 2596215 0.424
## eventTEDxRotterdam 2010 1075212 2997850 0.359
## eventTEDxSaltLakeCity 738834 3461619 0.213
## eventTEDxSanDiego 453885 2997850 0.151
## eventTEDxSanJoseCA 682589 3461619 0.197
## eventTEDxSanMigueldeAllende 170578 2997850 0.057
## eventTEDxSanQuentin 2049071 3461619 0.592
## eventTEDxSantaCruz 137598 3461619 0.040
## eventTEDxSBU 1140928 3461619 0.330
## eventTEDxSeattleU 985186 3461619 0.285
## eventTEDxSeoul 1691898 3461619 0.489
## eventTEDxSF 3508340 3461619 1.013
## eventTEDxSFU 1652698 3461619 0.477
## eventTEDxSiliconValley 532055 3461619 0.154
## eventTEDxSkoll 632462 2997850 0.211
## eventTEDxSMU 592831 2826400 0.210
## eventTEDxSonomaCounty 1474137 3461619 0.426
## eventTEDxSouthBank 1714111 3461619 0.495
## eventTEDxStanford 1069571 2580139 0.415
## eventTEDxSummit 1452820 2547683 0.570
## eventTEDxSussexUniversity 24508 3461619 0.007
## eventTEDxSydney 1921570 2540134 0.756
## eventTEDxTC 1178673 2826400 0.417
## eventTEDxTeen 858890 2997850 0.287
## eventTEDxTelAviv 2010 304542 2997850 0.102
## eventTEDxThessaloniki 599151 2826400 0.212
## eventTEDxTokyo 1411932 2997850 0.471
## eventTEDxToronto 812671 3461619 0.235
## eventTEDxToronto 2010 2079520 2736650 0.760
## eventTEDxToronto 2011 407128 2997850 0.136
## eventTEDxToulouse 708021 3461619 0.205
## eventTEDxUCL 203532 3461619 0.059
## eventTEDxUdeM 468653 3461619 0.135
## eventTEDxUF 1005689 3461619 0.291
## eventTEDxUIUC 448875 3461619 0.130
## eventTEDxUM 611234 3461619 0.177
## eventTEDxUMKC 1359370 3461619 0.393
## eventTEDxUniversityofNevada 1563970 3461619 0.452
## eventTEDxUofM 1482095 3461619 0.428
## eventTEDxUSC 842377 2580139 0.326
## eventTEDxUW 5767383 3461619 1.666
## eventTEDxVancouver 826486 2736650 0.302
## eventTEDxVictoria 642961 3461619 0.186
## eventTEDxVienna 719644 2826400 0.255
## eventTEDxVirginiaTech 722197 3461619 0.209
## eventTEDxWarwick 888952 2736650 0.325
## eventTEDxWaterloo -11006 3461619 -0.003
## eventTEDxWinnipeg 1048767 3461619 0.303
## eventTEDxWitsUniversity 854045 3461619 0.247
## eventTEDxWomen 2011 878411 2616738 0.336
## eventTEDxWomen 2012 1174267 2643856 0.444
## eventTEDxYouth@Manchester 1576322 2997850 0.526
## eventTEDxYouth@Sydney 646076 2997850 0.216
## eventTEDxYYC 220143 2826400 0.078
## eventTEDxZurich 471613 3461619 0.136
## eventTEDxZurich 2011 468911 3461619 0.135
## eventTEDxZurich 2012 1311720 2997850 0.438
## eventTEDxZurich 2013 815473 3461619 0.236
## eventTEDYouth 2011 1194095 2826400 0.422
## eventTEDYouth 2012 125168 3461619 0.036
## eventTEDYouth 2013 812988 2826400 0.288
## eventTEDYouth 2014 1056465 2616738 0.404
## eventTEDYouth 2015 1621288 2681359 0.605
## eventThe Do Lectures -37497 3461619 -0.011
## eventToronto Youth Corps 878812 3461619 0.254
## eventUniversity of California 112014 2997850 0.037
## eventWeb 2.0 Expo 2008 607973 3461619 0.176
## eventWorld Science Festival 3152494 3461619 0.911
## Pr(>|t|)
## (Intercept) 0.95120
## eventArbejdsglaede Live 0.81237
## eventBBC TV 0.91439
## eventBowery Poetry Club 0.87903
## eventBusiness Innovation Factory 0.95648
## eventCarnegie Mellon University 0.90459
## eventChautauqua Institution 0.96866
## eventDICE Summit 2010 0.93110
## eventDLD 2007 0.85923
## eventEG 2007 0.72684
## eventEG 2008 0.46471
## eventElizabeth G. Anderson School 0.80007
## eventEric Whitacre's Virtual Choir 0.94256
## eventFort Worth City Council 0.97031
## eventFull Spectrum Auditions 0.71755
## eventGel Conference 0.86583
## eventGlobal Witness HQ 0.80251
## eventHandheld Learning 0.97661
## eventHarvard University 0.71669
## eventINK Conference 0.57230
## eventJustice with Michael Sandel 0.94389
## eventLIFT 2007 0.69935
## eventMichael Howard Studios 0.99374
## eventMission Blue II 0.64960
## eventMission Blue Voyage 0.87819
## eventNew York State Senate 0.96715
## eventNextGen:Charity 0.96687
## eventPrinceton University 0.88292
## eventRoyal Institution 0.96116
## eventRSA Animate 0.75935
## eventSerious Play 2008 0.76501
## eventSkoll World Forum 2007 0.93820
## eventSoulPancake 0.84465
## eventStanford University 0.13272
## eventTaste3 2008 0.73701
## eventTED Dialogues 0.72554
## eventTED Fellows 2015 0.74048
## eventTED Fellows Retreat 2013 0.78216
## eventTED Fellows Retreat 2015 0.53870
## eventTED in the Field 0.85323
## eventTED Prize Wish 0.93752
## eventTED Residency 0.76040
## eventTED Senior Fellows at TEDGlobal 2010 0.96146
## eventTED Studio 0.64483
## eventTED Talks Education 0.10378
## eventTED Talks Live 0.59155
## eventTED-Ed 0.92777
## eventTED-Ed Weekend 0.79308
## eventTED@Bangalore 0.78036
## eventTED@BCG Berlin 0.57600
## eventTED@BCG London 0.40491
## eventTED@BCG Paris 0.60573
## eventTED@BCG San Francisco 0.52555
## eventTED@BCG Singapore 0.69185
## eventTED@Cannes 0.64138
## eventTED@IBM 0.57824
## eventTED@Intel 0.78477
## eventTED@Johannesburg 0.67647
## eventTED@London 0.77846
## eventTED@MotorCity 0.84980
## eventTED@Nairobi 0.82239
## eventTED@New York 0.58385
## eventTED@NYC 0.53891
## eventTED@State 0.77180
## eventTED@State Street Boston 0.64386
## eventTED@State Street London 0.47072
## eventTED@StateStreet Boston 0.65905
## eventTED@SXSWi 0.88389
## eventTED@Unilever 0.56101
## eventTED@UPS 0.53876
## eventTED1984 0.81181
## eventTED1990 0.89179
## eventTED1994 0.90079
## eventTED1998 0.82018
## eventTED2001 0.54544
## eventTED2002 0.74762
## eventTED2003 0.69871
## eventTED2004 0.30647
## eventTED2005 0.50954
## eventTED2006 0.20689
## eventTED2007 0.58093
## eventTED2008 0.44436
## eventTED2009 0.51458
## eventTED2010 0.50390
## eventTED2011 0.46090
## eventTED2012 0.40069
## eventTED2013 0.38228
## eventTED2014 0.43488
## eventTED2015 0.45012
## eventTED2016 0.49978
## eventTED2017 0.67371
## eventTEDActive 2011 0.69554
## eventTEDActive 2014 0.66030
## eventTEDActive 2015 0.68992
## eventTEDCity2.0 0.70507
## eventTEDGlobal 2005 0.55054
## eventTEDGlobal 2007 0.86863
## eventTEDGlobal 2009 0.53533
## eventTEDGlobal 2010 0.63025
## eventTEDGlobal 2011 0.52490
## eventTEDGlobal 2012 0.43552
## eventTEDGlobal 2013 0.32371
## eventTEDGlobal 2014 0.63705
## eventTEDGlobal 2017 0.88572
## eventTEDGlobal>Geneva 0.20579
## eventTEDGlobal>London 0.45107
## eventTEDGlobalLondon 0.35728
## eventTEDIndia 2009 0.62622
## eventTEDLagos Ideas Search 0.71402
## eventTEDMED 2009 0.41232
## eventTEDMED 2010 0.90504
## eventTEDMED 2011 0.81207
## eventTEDMED 2012 0.63965
## eventTEDMED 2013 0.55517
## eventTEDMED 2014 0.53468
## eventTEDMED 2015 0.46352
## eventTEDMED 2016 0.67081
## eventTEDNairobi Ideas Search 0.87445
## eventTEDNYC 0.66346
## eventTEDPrize@UN 0.82994
## eventTEDSalon 2006 0.72932
## eventTEDSalon 2007 Hot Science 0.80557
## eventTEDSalon 2009 Compassion 0.96276
## eventTEDSalon Berlin 2014 0.54489
## eventTEDSalon London 2009 0.81583
## eventTEDSalon London 2010 0.63691
## eventTEDSalon London Fall 2011 0.48803
## eventTEDSalon London Fall 2012 0.40027
## eventTEDSalon London Spring 2011 0.73298
## eventTEDSalon London Spring 2012 0.72655
## eventTEDSalon NY2011 0.60260
## eventTEDSalon NY2012 0.76931
## eventTEDSalon NY2013 0.28893
## eventTEDSalon NY2014 0.44907
## eventTEDSalon NY2015 0.65203
## eventTEDSummit 0.57845
## eventTEDWomen 2010 0.64387
## eventTEDWomen 2013 0.36147
## eventTEDWomen 2015 0.60179
## eventTEDWomen 2016 0.63345
## eventTEDxABQ 0.96568
## eventTEDxAmazonia 0.80938
## eventTEDxAmericanRiviera 0.22134
## eventTEDxAmoskeagMillyard 0.79266
## eventTEDxAmsterdam 0.80360
## eventTEDxArendal 0.58010
## eventTEDxAsheville 0.93810
## eventTEDxAthens 0.58111
## eventTEDxAtlanta 0.88092
## eventTEDxAustin 0.70708
## eventTEDxBeaconStreet 0.41107
## eventTEDxBeirut 0.75435
## eventTEDxBend 0.34495
## eventTEDxBerkeley 0.83707
## eventTEDxBerlin 0.75931
## eventTEDxBG 0.89604
## eventTEDxBinghamtonUniversity 0.22622
## eventTEDxBloomington 0.00187 **
## eventTEDxBoston 0.71378
## eventTEDxBoston 2009 0.92016
## eventTEDxBoston 2010 0.84740
## eventTEDxBoston 2011 0.84625
## eventTEDxBoston 2012 0.77031
## eventTEDxBoulder 0.55955
## eventTEDxBoulder 2011 0.75594
## eventTEDxBratislava 0.60706
## eventTEDxBrighton 0.83361
## eventTEDxBrussels 0.70237
## eventTEDxCaFoscariU 0.76447
## eventTEDxCaltech 0.67210
## eventTEDxCambridge 0.57424
## eventTEDxCanberra 0.87258
## eventTEDxCannes 0.85963
## eventTEDxCERN 0.65039
## eventTEDxChange 0.74710
## eventTEDxChapmanU 0.39441
## eventTEDxCHUV 0.19054
## eventTEDxClaremontColleges 0.72983
## eventTEDxCMU 0.65895
## eventTEDxColbyCollege 0.71292
## eventTEDxColoradoSprings 0.80020
## eventTEDxColumbus 0.74427
## eventTEDxColumbusWomen 0.58653
## eventTEDxConcorde 0.76297
## eventTEDxConcordiaUPortland 0.36744
## eventTEDxCreativeCoast 0.01664 *
## eventTEDxCrenshaw 0.86355
## eventTEDxDanubia 0.68857
## eventTEDxDeExtinction 0.84097
## eventTEDxDelft 0.79745
## eventTEDxDesMoines 0.74412
## eventTEDxDirigo 0.99038
## eventTEDxDU 2010 0.80240
## eventTEDxDU 2011 0.94944
## eventTEDxDubai 0.67009
## eventTEDxDublin 0.89095
## eventTEDxEast 0.82905
## eventTEDxEastEnd 0.63134
## eventTEDxEdmonton 0.67974
## eventTEDxEQChCh 0.43895
## eventTEDxEuston 0.73570
## eventTEDxExeter 0.67314
## eventTEDxFiDiWomen 0.66987
## eventTEDxFrankfurt 0.82650
## eventTEDxFulbrightDublin 0.79426
## eventTEDxGatewayWomen 0.71196
## eventTEDxGeorgetown 0.88271
## eventTEDxGhent 0.81639
## eventTEDxGlasgow 0.79657
## eventTEDxGoldenGatePark 2012 0.17825
## eventTEDxGoodenoughCollege 0.92428
## eventTEDxGöteborg 2010 0.92784
## eventTEDxGrandRapids 0.70380
## eventTEDxGreatPacificGarbagePatch 0.91667
## eventTEDxGroningen 0.70655
## eventTEDxHamburg 0.91045
## eventTEDxHampshireCollege 0.83303
## eventTEDxHelvetia 0.76274
## eventTEDxHogeschoolUtrecht 0.88781
## eventTEDxHousesOfParliament 0.77758
## eventTEDxHouston 1.06e-07 ***
## eventTEDxImperialCollege 0.76495
## eventTEDxIndianapolis 0.35367
## eventTEDxIndianaUniversity 0.77608
## eventTEDxIslay 0.99319
## eventTEDxJaffa 2012 0.58633
## eventTEDxJaffa 2013 0.42175
## eventTEDxKC 0.66890
## eventTEDxKids@Ambleside 0.41798
## eventTEDxKids@Brussels 0.93357
## eventTEDxKrakow 0.82821
## eventTEDxKyoto 0.30931
## eventTEDxLeuvenSalon 0.76387
## eventTEDxLinnaeusUniversity 0.16263
## eventTEDxLondonBusinessSchool 0.83265
## eventTEDxMaastricht 0.90127
## eventTEDxManchester 0.83440
## eventTEDxManhattan 0.64056
## eventTEDxManhattanBeach 0.27256
## eventTEDxMarin 0.55376
## eventTEDxMaui 0.58099
## eventTEDxMet 0.17601
## eventTEDxMIA 0.95412
## eventTEDxMiamiUniversity 0.94069
## eventTEDxMidAtlantic 0.39214
## eventTEDxMidAtlantic 2013 0.62914
## eventTEDxMidwest 0.40980
## eventTEDxMileHigh 0.78788
## eventTEDxMonroeCorrectionalComplex 0.84810
## eventTEDxMonterey 0.99608
## eventTEDxMontreal 0.90317
## eventTEDxMtHood 0.33836
## eventTEDxMuncyStatePrison 0.73752
## eventTEDxNASA 0.70812
## eventTEDxNASA@SiliconValley 0.99860
## eventTEDxNatick 0.83232
## eventTEDxNewy 0.83448
## eventTEDxNewYork 0.53377
## eventTEDxNextGenerationAsheville 0.56291
## eventTEDxNijmegen 0.85316
## eventTEDxNorrkoping 0.06380 .
## eventTEDxNorthwesternU 0.79015
## eventTEDxNYED 0.66999
## eventTEDxO'Porto 0.96741
## eventTEDxObserver 0.99662
## eventTEDxOilSpill 0.92315
## eventTEDxOmaha 0.67224
## eventTEDxOrangeCoast 0.67021
## eventTEDxOrcasIsland 0.81103
## eventTEDxOslo 0.61597
## eventTEDxParis 2010 0.89848
## eventTEDxParis 2012 0.28642
## eventTEDxPeachtree 0.56115
## eventTEDxPenn 0.72900
## eventTEDxPennQuarter 0.50926
## eventTEDxPennsylvaniaAvenue 0.86271
## eventTEDxPerth 0.58870
## eventTEDxPhoenix 0.96070
## eventTEDxPittsburgh 0.98657
## eventTEDxPlaceDesNations 0.77409
## eventTEDxPortland 0.45107
## eventTEDxPortofSpain 0.91137
## eventTEDxProvidence 0.71484
## eventTEDxPSU 0.63742
## eventTEDxPuget Sound < 2e-16 ***
## eventTEDxRainier 0.51314
## eventTEDxRC2 0.90247
## eventTEDxRiodelaPlata 0.67182
## eventTEDxRotterdam 2010 0.71988
## eventTEDxSaltLakeCity 0.83101
## eventTEDxSanDiego 0.87967
## eventTEDxSanJoseCA 0.84370
## eventTEDxSanMigueldeAllende 0.95463
## eventTEDxSanQuentin 0.55395
## eventTEDxSantaCruz 0.96830
## eventTEDxSBU 0.74174
## eventTEDxSeattleU 0.77598
## eventTEDxSeoul 0.62506
## eventTEDxSF 0.31093
## eventTEDxSFU 0.63310
## eventTEDxSiliconValley 0.87786
## eventTEDxSkoll 0.83293
## eventTEDxSMU 0.83388
## eventTEDxSonomaCounty 0.67026
## eventTEDxSouthBank 0.62053
## eventTEDxStanford 0.67852
## eventTEDxSummit 0.56857
## eventTEDxSussexUniversity 0.99435
## eventTEDxSydney 0.44944
## eventTEDxTC 0.67670
## eventTEDxTeen 0.77452
## eventTEDxTelAviv 2010 0.91909
## eventTEDxThessaloniki 0.83214
## eventTEDxTokyo 0.63770
## eventTEDxToronto 0.81441
## eventTEDxToronto 2010 0.44741
## eventTEDxToronto 2011 0.89199
## eventTEDxToulouse 0.83795
## eventTEDxUCL 0.95312
## eventTEDxUdeM 0.89232
## eventTEDxUF 0.77144
## eventTEDxUIUC 0.89684
## eventTEDxUM 0.85986
## eventTEDxUMKC 0.69458
## eventTEDxUniversityofNevada 0.65146
## eventTEDxUofM 0.66858
## eventTEDxUSC 0.74409
## eventTEDxUW 0.09584 .
## eventTEDxVancouver 0.76268
## eventTEDxVictoria 0.85267
## eventTEDxVienna 0.79904
## eventTEDxVirginiaTech 0.83476
## eventTEDxWarwick 0.74534
## eventTEDxWaterloo 0.99746
## eventTEDxWinnipeg 0.76194
## eventTEDxWitsUniversity 0.80515
## eventTEDxWomen 2011 0.73714
## eventTEDxWomen 2012 0.65698
## eventTEDxYouth@Manchester 0.59907
## eventTEDxYouth@Sydney 0.82939
## eventTEDxYYC 0.93792
## eventTEDxZurich 0.89164
## eventTEDxZurich 2011 0.89226
## eventTEDxZurich 2012 0.66175
## eventTEDxZurich 2013 0.81378
## eventTEDYouth 2011 0.67272
## eventTEDYouth 2012 0.97116
## eventTEDYouth 2013 0.77365
## eventTEDYouth 2014 0.68645
## eventTEDYouth 2015 0.54547
## eventThe Do Lectures 0.99136
## eventToronto Youth Corps 0.79962
## eventUniversity of California 0.97020
## eventWeb 2.0 Expo 2008 0.86060
## eventWorld Science Festival 0.36255
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2448000 on 2195 degrees of freedom
## Multiple R-squared: 0.1735, Adjusted R-squared: 0.04021
## F-statistic: 1.302 on 354 and 2195 DF, p-value: 0.0003659
# the specific ted location / event conference did not show any significant effects on views
summary(lm(views ~ ted$date_pub, data = ted))
##
## Call:
## lm(formula = views ~ ted$date_pub, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1685735 -957633 -557580 12248 45437906
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2334019.30 704368.13 3.314 0.000934 ***
## ted$date_pub -40.88 45.19 -0.905 0.365669
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2499000 on 2548 degrees of freedom
## Multiple R-squared: 0.0003212, Adjusted R-squared: -7.116e-05
## F-statistic: 0.8186 on 1 and 2548 DF, p-value: 0.3657
# the date published didn't explain at all and had a very small coefiicient which wasn't significant
#fit_occ <- lm(views ~ factor(ted$speaker_occupation), data = ted)
#summary(fit_occ)
Only a few occupations were significant in predicting view counts: factor(ted\(speaker_occupation)Author/educator < 2e-16 *** factor(ted\)speaker_occupation)Autonomous systems pioneer 0.012050 *
factor(ted\(speaker_occupation)Beatboxer 0.006905 ** factor(ted\)speaker_occupation)Blogger 0.001010 ** factor(ted\(speaker_occupation)Bionics designer 0.022110 * factor(ted\)speaker_occupation)Anthropologist, expert on love 0.023640 *
These are correlated with a number of the occupations of the most popular ted talk; for example, Author/educator is the occupation of Ken Robinson, who speaks at the most popular Ted Talk of all times and many other popular talks. However, the regression still wasn’t successful, and while the R squared was 0.53, the adjusted R-squared was -0.08, because of the huge amount of predictors.
# Adding variables one-by-one
fit6 <- lm(views ~ comments + languages, data = ted)
fit7 <- lm(views ~ comments + languages + num_tags, data=ted)
fit8 <- lm(views ~ comments + languages + num_tags + weekend , data=ted)
fit9 <- lm(views ~ comments + languages + num_tags + weekend + duration, data=ted)
summary(fit1) #just comments
##
## Call:
## lm(formula = views ~ comments, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26514433 -667711 -254429 229873 31596994
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 798186.6 50681.6 15.75 <2e-16 ***
## comments 4698.8 148.6 31.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2118000 on 2548 degrees of freedom
## Multiple R-squared: 0.2819, Adjusted R-squared: 0.2816
## F-statistic: 1000 on 1 and 2548 DF, p-value: < 2.2e-16
summary(fit6)
##
## Call:
## lm(formula = views ~ comments + languages, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23341925 -730945 -226309 394521 31533398
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -733892.8 123037.9 -5.965 2.79e-09 ***
## comments 4044.9 151.4 26.721 < 2e-16 ***
## languages 60650.3 4468.6 13.573 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2045000 on 2547 degrees of freedom
## Multiple R-squared: 0.3303, Adjusted R-squared: 0.3298
## F-statistic: 628.2 on 2 and 2547 DF, p-value: < 2.2e-16
summary(fit7)
##
## Call:
## lm(formula = views ~ comments + languages + num_tags, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23464948 -738986 -220033 353133 31476369
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -993507.0 152609.0 -6.510 9.01e-11 ***
## comments 4071.5 151.4 26.884 < 2e-16 ***
## languages 62446.0 4506.0 13.858 < 2e-16 ***
## num_tags 27351.3 9536.7 2.868 0.00416 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2042000 on 2546 degrees of freedom
## Multiple R-squared: 0.3325, Adjusted R-squared: 0.3317
## F-statistic: 422.7 on 3 and 2546 DF, p-value: < 2.2e-16
summary(fit8)
##
## Call:
## lm(formula = views ~ comments + languages + num_tags + weekend,
## data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23490106 -738708 -216928 350143 31474583
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -975622.2 158508.7 -6.155 8.7e-10 ***
## comments 4076.0 151.9 26.841 < 2e-16 ***
## languages 61949.1 4660.7 13.292 < 2e-16 ***
## num_tags 27156.9 9549.5 2.844 0.00449 **
## weekend -86147.3 205940.5 -0.418 0.67575
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2043000 on 2545 degrees of freedom
## Multiple R-squared: 0.3325, Adjusted R-squared: 0.3315
## F-statistic: 317 on 4 and 2545 DF, p-value: < 2.2e-16
summary(fit9)
##
## Call:
## lm(formula = views ~ comments + languages + num_tags + weekend +
## duration, data = ted)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23061497 -729849 -236805 353088 31452340
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1455238.1 209605.7 -6.943 4.86e-12 ***
## comments 3931.5 157.1 25.027 < 2e-16 ***
## languages 68223.0 4986.4 13.682 < 2e-16 ***
## num_tags 26625.6 9529.9 2.794 0.005247 **
## weekend -41407.3 205890.7 -0.201 0.840626
## duration 408.8 117.3 3.487 0.000497 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2038000 on 2544 degrees of freedom
## Multiple R-squared: 0.3357, Adjusted R-squared: 0.3344
## F-statistic: 257.1 on 5 and 2544 DF, p-value: < 2.2e-16
library(stargazer)
stargazer(fit1, fit2, fit3, fit4, fit6, fit7, fit8, fit9, type="text", title="All Models Compared", align=TRUE, no.space=TRUE, font.size = "footnotesize")
##
## All Models Compared
## =====================================================================================================================================================================================================================================
## Dependent variable:
## -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## views
## (1) (2) (3) (4) (5) (6) (7) (8)
## -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## comments 4,698.788*** 4,044.862*** 4,071.480*** 4,076.027*** 3,931.536***
## (148.571) (151.374) (151.444) (151.858) (157.090)
## languages 98,655.110*** 60,650.310*** 62,446.000*** 61,949.050*** 68,222.950***
## (4,792.403) (4,468.592) (4,505.978) (4,660.659) (4,986.410)
## duration 325.599** 408.844***
## (132.184) (117.251)
## weekend -836,986.200***
## (243,014.000)
## num_tags 27,351.270*** 27,156.920*** 26,625.560***
## (9,536.676) (9,549.530) (9,529.882)
## weekend -86,147.350 -41,407.250
## (205,940.500) (205,890.700)
## Constant 798,186.600*** -997,579.200*** 1,429,187.000*** 1,734,403.000*** -733,892.800*** -993,507.000*** -975,622.200*** -1,455,238.000***
## (50,681.560) (138,743.900) (119,912.200) (50,472.810) (123,037.900) (152,609.000) (158,508.700) (209,605.700)
## -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## Observations 2,550 2,550 2,550 2,550 2,550 2,550 2,550 2,550
## R2 0.282 0.143 0.002 0.005 0.330 0.332 0.333 0.336
## Adjusted R2 0.282 0.142 0.002 0.004 0.330 0.332 0.331 0.334
## Residual Std. Error 2,117,652.000 (df = 2548) 2,313,944.000 (df = 2548) 2,496,000.000 (df = 2548) 2,493,173.000 (df = 2548) 2,045,392.000 (df = 2547) 2,042,496.000 (df = 2546) 2,042,827.000 (df = 2545) 2,038,364.000 (df = 2544)
## F Statistic 1,000.232*** (df = 1; 2548) 423.772*** (df = 1; 2548) 6.068** (df = 1; 2548) 11.862*** (df = 1; 2548) 628.185*** (df = 2; 2547) 422.720*** (df = 3; 2546) 316.981*** (df = 4; 2545) 257.128*** (df = 5; 2544)
## =====================================================================================================================================================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Chosen model is the last model since it had the best explanatory power in terms of R squared, adjusted R squared, p value and F-statistic, although it had only marginal improvements over model (5) with only comments and languages translated. Model 5: Y(views)=β_0+ β_1 comments+ β_2 languages+ϵ Model 8: Y(views)=β_1 comments+ β_2 languages+ β_3 numtags+ β_4 isweekend+β_5 duration+ϵ For predicting purposes, I would choose model 8 with all variables. For explanatory purposes, I would choose model 5 to explain that comments and languages are by far the most correlated with views and explain most of its variance. Model 5 suggests that every additional comment is associated with 4,044 more views (p-value under 0.01) and that every additional language translated is associated with 60,650 more views (p-value under 0.01). However, the constant is negative (-733) views, which makes no sense, but that comes with the restriction of a linear model. These together explained 0.33 of the variance (both R-squared and adjusted R-squared). The F-statistic Y(views)=-733+ 4044comments+ 60650languages However, adding all the other variables into model 8 improved slightly the R-squared to 0.336 and Adjusted R-squared to 0.334. So, if we are after accuracy for prediction, I would use this latter model: Y(views)=-1455238+3931comments+ 68222languages+408 duration+ 26625numtags+ -41407isweekend The results, and particularly model 8, show overall significance. Most variables show significance, although weekend does not, but adding it still improved the explanatory power slightly, so I’m keeping it. F-statistic is relatively lower, and R and R-squared are not great at 0.336 and 0.334 respectively, but the best performance out of this set of models. The constant decreased much more, giving more power to the variables to raise the predicted view count. The coefficients (estimation of the effect) of comments decreased from 4044 to 3931 and was redistributed to higher coefficient for the number of languages and new coefficients for the newly added variables: 408 more views for every additional second, 26625 more views for every additional tag, and this is compensated by reducing the predicted number of views by 41407 if it was published on a weekend.
For conclusion, this very limited model does not convey causal relationship well because the fundamental problem of causal inference is not well addressed with these variables, and these predictors are not independent from the y variable, but they are highly related (mostly comments and number of languages which are the best predictors, naturally. I don’t believe that with these available numerical predictors we could have reached a causal inference. Next attempts might use the transcription of the talk to analyze the content, or audio to analyze the level of clapping, or the visuals in the talk and the clothing of the speaker to better predict using the content of the talk. Implications However, we can see some correlations, even if not causal. So, if you want a higher number of views for a talk, it would be likelier if you: Increase the number of comments in a talk! (get all of your friends to comment and discuss) Increase the languages translated! (get your friends or freelancers to translate) There is no need to make it too short! (probably only works if the talk is really good) Tag it with more topics! Whatever you do – do NOT publish it on a weekend. Publish it on a Friday! So, go increase your TED Talk’s view count and comment if these strategies worked or not! (and tell me about it, since that would be a helpful small experiment which could reaffirm or reject these results!)