Summary Statistics

Here are some in a Summary Statistics - a few tables and appropriate graphs that describe the data.

1 - Histograms

Let’s examine the distributions of the key parameters that we’ll be using. To the right are histograms of most of our numerical variables. Below are some more detailed histograms, with a white line for the median and blue line for the mean. ¬¬

Histograms of all numerical variables

print(summary(ted))
##     comments     
##  Min.   :   2.0  
##  1st Qu.:  63.0  
##  Median : 118.0  
##  Mean   : 191.6  
##  3rd Qu.: 221.8  
##  Max.   :6404.0  
##                  
##                                                                                                                                                                                                                                                                                                                                                                                                                                                             description  
##  'I am a mathematician, and I would like to stand on your roof.' That is how Ron Eglash greeted many African families he met while researching the fractal patterns he'd noticed in villages across the continent.                                                                                                                                                                                                                                                :   1  
##  "A forest is much more than what you see," says ecologist Suzanne Simard. Her 30 years of research in Canadian forests have led to an astounding discovery -- trees talk, often and over vast distances. Learn more about the harmonious yet complicated social lives of trees and prepare to see the natural world with new eyes.                                                                                                                               :   1  
##  "An advanced city is not one where even the poor use cars, but rather one where even the rich use public transport," argues Enrique Peñalosa. In this spirited talk, the mayor of Bogotá shares some of the tactics he used to change the transportation dynamic in the Colombian capital... and suggests ways to think about building smart cities of the future.                                                                                               :   1  
##  "Anything that is worth pursuing is going to require us to suffer, just a little bit," says surf photographer Chris Burkard, as he explains his obsession with the coldest, choppiest, most isolated beaches on earth. With jawdropping photos and stories of places few humans have ever seen -- much less surfed -- he draws us into his "personal crusade against the mundane."                                                                               :   1  
##  "Architecture is not about math or zoning -- it's about visceral emotions," says Marc Kushner. In a sweeping — often funny — talk, he zooms through the past thirty years of architecture to show how the public, once disconnected, have become an essential part of the design process. With the help of social media, feedback reaches architects years before a building is even created. The result? Architecture that will do more for us than ever before.:   1  
##  "Babies and young children are like the R&D division of the human species," says psychologist Alison Gopnik. Her research explores the sophisticated intelligence-gathering and decision-making that babies are really doing when they play.                                                                                                                                                                                                                     :   1  
##  (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                          :2544  
##     duration          event        film_date           languages    
##  Min.   : 135.0   TED2014:  84   Min.   :7.465e+07   Min.   : 0.00  
##  1st Qu.: 577.0   TED2009:  83   1st Qu.:1.257e+09   1st Qu.:23.00  
##  Median : 848.0   TED2013:  77   Median :1.333e+09   Median :28.00  
##  Mean   : 826.5   TED2016:  77   Mean   :1.322e+09   Mean   :27.33  
##  3rd Qu.:1046.8   TED2015:  75   3rd Qu.:1.413e+09   3rd Qu.:33.00  
##  Max.   :5256.0   TED2011:  70   Max.   :1.504e+09   Max.   :72.00  
##                   (Other):2084                                      
##         main_speaker 
##  Hans Rosling :   9  
##  Juan Enriquez:   7  
##  Marco Tempest:   6  
##  Rives        :   6  
##  Bill Gates   :   5  
##  Clay Shirky  :   5  
##  (Other)      :2512  
##                                                                          name     
##  Aakash Odedra: A dance in a hurricane of paper, wind and light            :   1  
##  Aala El-Khani: What it's like to be a parent in a war zone                :   1  
##  Aaron Huey: America's native prisoners of war                             :   1  
##  Aaron Koblin: Visualizing ourselves ... with crowd-sourced data           :   1  
##  Aaron O'Connell: Making sense of a visible quantum object                 :   1  
##  Abe Davis: New video technology that reveals an object's hidden properties:   1  
##  (Other)                                                                   :2544  
##   num_speaker    published_date     
##  Min.   :1.000   Min.   :1.151e+09  
##  1st Qu.:1.000   1st Qu.:1.268e+09  
##  Median :1.000   Median :1.341e+09  
##  Mean   :1.028   Mean   :1.344e+09  
##  3rd Qu.:1.000   3rd Qu.:1.423e+09  
##  Max.   :5.000   Max.   :1.506e+09  
##                                     
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ratings    
##  [{'id': 1, 'name': 'Beautiful', 'count': 100}, {'id': 9, 'name': 'Ingenious', 'count': 97}, {'id': 3, 'name': 'Courageous', 'count': 78}, {'id': 8, 'name': 'Informative', 'count': 313}, {'id': 10, 'name': 'Inspiring', 'count': 518}, {'id': 21, 'name': 'Unconvincing', 'count': 30}, {'id': 22, 'name': 'Fascinating', 'count': 158}, {'id': 11, 'name': 'Longwinded', 'count': 31}, {'id': 7, 'name': 'Funny', 'count': 12}, {'id': 25, 'name': 'OK', 'count': 59}, {'id': 23, 'name': 'Jaw-dropping', 'count': 72}, {'id': 24, 'name': 'Persuasive', 'count': 174}, {'id': 26, 'name': 'Obnoxious', 'count': 14}, {'id': 2, 'name': 'Confusing', 'count': 15}]  :   1  
##  [{'id': 1, 'name': 'Beautiful', 'count': 101}, {'id': 3, 'name': 'Courageous', 'count': 141}, {'id': 10, 'name': 'Inspiring', 'count': 252}, {'id': 11, 'name': 'Longwinded', 'count': 20}, {'id': 8, 'name': 'Informative', 'count': 92}, {'id': 22, 'name': 'Fascinating', 'count': 36}, {'id': 2, 'name': 'Confusing', 'count': 15}, {'id': 24, 'name': 'Persuasive', 'count': 39}, {'id': 21, 'name': 'Unconvincing', 'count': 10}, {'id': 26, 'name': 'Obnoxious', 'count': 11}, {'id': 9, 'name': 'Ingenious', 'count': 19}, {'id': 25, 'name': 'OK', 'count': 28}, {'id': 23, 'name': 'Jaw-dropping', 'count': 7}, {'id': 7, 'name': 'Funny', 'count': 5}]      :   1  
##  [{'id': 1, 'name': 'Beautiful', 'count': 101}, {'id': 3, 'name': 'Courageous', 'count': 84}, {'id': 10, 'name': 'Inspiring', 'count': 199}, {'id': 22, 'name': 'Fascinating', 'count': 83}, {'id': 9, 'name': 'Ingenious', 'count': 61}, {'id': 8, 'name': 'Informative', 'count': 67}, {'id': 24, 'name': 'Persuasive', 'count': 13}, {'id': 11, 'name': 'Longwinded', 'count': 2}, {'id': 25, 'name': 'OK', 'count': 10}, {'id': 7, 'name': 'Funny', 'count': 2}, {'id': 21, 'name': 'Unconvincing', 'count': 2}, {'id': 23, 'name': 'Jaw-dropping', 'count': 23}, {'id': 2, 'name': 'Confusing', 'count': 4}, {'id': 26, 'name': 'Obnoxious', 'count': 1}]          :   1  
##  [{'id': 1, 'name': 'Beautiful', 'count': 101}, {'id': 7, 'name': 'Funny', 'count': 306}, {'id': 9, 'name': 'Ingenious', 'count': 106}, {'id': 11, 'name': 'Longwinded', 'count': 54}, {'id': 3, 'name': 'Courageous', 'count': 39}, {'id': 10, 'name': 'Inspiring', 'count': 230}, {'id': 8, 'name': 'Informative', 'count': 360}, {'id': 21, 'name': 'Unconvincing', 'count': 184}, {'id': 22, 'name': 'Fascinating', 'count': 298}, {'id': 2, 'name': 'Confusing', 'count': 34}, {'id': 25, 'name': 'OK', 'count': 66}, {'id': 23, 'name': 'Jaw-dropping', 'count': 54}, {'id': 26, 'name': 'Obnoxious', 'count': 47}, {'id': 24, 'name': 'Persuasive', 'count': 29}]:   1  
##  [{'id': 1, 'name': 'Beautiful', 'count': 102}, {'id': 3, 'name': 'Courageous', 'count': 103}, {'id': 23, 'name': 'Jaw-dropping', 'count': 40}, {'id': 10, 'name': 'Inspiring', 'count': 236}, {'id': 11, 'name': 'Longwinded', 'count': 18}, {'id': 25, 'name': 'OK', 'count': 22}, {'id': 21, 'name': 'Unconvincing', 'count': 18}, {'id': 26, 'name': 'Obnoxious', 'count': 12}, {'id': 22, 'name': 'Fascinating', 'count': 69}, {'id': 24, 'name': 'Persuasive', 'count': 64}, {'id': 9, 'name': 'Ingenious', 'count': 2}, {'id': 8, 'name': 'Informative', 'count': 80}, {'id': 2, 'name': 'Confusing', 'count': 8}, {'id': 7, 'name': 'Funny', 'count': 16}]      :   1  
##  [{'id': 1, 'name': 'Beautiful', 'count': 102}, {'id': 8, 'name': 'Informative', 'count': 191}, {'id': 24, 'name': 'Persuasive', 'count': 124}, {'id': 3, 'name': 'Courageous', 'count': 87}, {'id': 10, 'name': 'Inspiring', 'count': 107}, {'id': 22, 'name': 'Fascinating', 'count': 109}, {'id': 9, 'name': 'Ingenious', 'count': 58}, {'id': 26, 'name': 'Obnoxious', 'count': 8}, {'id': 2, 'name': 'Confusing', 'count': 2}, {'id': 25, 'name': 'OK', 'count': 17}, {'id': 7, 'name': 'Funny', 'count': 16}, {'id': 23, 'name': 'Jaw-dropping', 'count': 14}, {'id': 11, 'name': 'Longwinded', 'count': 4}, {'id': 21, 'name': 'Unconvincing', 'count': 4}]      :   1  
##  (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                :2544  
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              related_talks 
##  [{'id': 1, 'hero': 'https://pe.tedcdn.com/images/ted/a2194b4ef5170bd9e9b086c5053e09cdf3545960_2880x1620.jpg', 'speaker': 'Al Gore', 'title': 'Averting the climate crisis', 'duration': 977, 'slug': 'al_gore_on_averting_climate_crisis', 'viewed_count': 3200590}, {'id': 346, 'hero': 'https://pe.tedcdn.com/images/ted/53153_480x360.jpg', 'speaker': 'Brewster Kahle', 'title': 'A free digital library', 'duration': 1206, 'slug': 'brewster_kahle_builds_a_free_digital_library', 'viewed_count': 445784}, {'id': 1410, 'hero': 'https://pe.tedcdn.com/images/ted/ec8292db13815de32529bf4568ce76b48a089bc5_2880x1620.jpg', 'speaker': 'Chip Kidd', 'title': 'Designing books is no laughing matter. OK, it is.', 'duration': 1036, 'slug': 'chip_kidd_designing_books_is_no_laughing_matter_ok_it_is', 'viewed_count': 1752006}, {'id': 2180, 'hero': 'https://pe.tedcdn.com/images/ted/2ff057a4bb41b2246c65370cf104a8ffa0dfc75a_2880x1620.jpg', 'speaker': 'Brian Dettmer', 'title': 'Old books reborn as art', 'duration': 366, 'slug': 'brian_dettmer_old_books_reborn_as_intricate_art', 'viewed_count': 1159918}, {'id': 2382, 'hero': 'https://pe.tedcdn.com/images/ted/f0492898d0b7658a39ac9348f555f11e936b5068_2880x1620.jpg', 'speaker': 'Ann Morgan', 'title': 'My year reading a book from every country in the world', 'duration': 723, 'slug': 'ann_morgan_my_year_reading_a_book_from_every_country_in_the_world', 'viewed_count': 1446231}, {'id': 1366, 'hero': 'https://pe.tedcdn.com/images/ted/7e5896a6ead7ad1d0f09c608faf7fa1b090a8dff_800x600.jpg', 'speaker': 'Shilo Shiv Suleman', 'title': 'Using tech to enable dreaming', 'duration': 456, 'slug': 'shilo_shiv_suleman_using_tech_to_enable_dreaming', 'viewed_count': 576917}]                   :   1  
##  [{'id': 1, 'hero': 'https://pe.tedcdn.com/images/ted/a2194b4ef5170bd9e9b086c5053e09cdf3545960_2880x1620.jpg', 'speaker': 'Al Gore', 'title': 'Averting the climate crisis', 'duration': 977, 'slug': 'al_gore_on_averting_climate_crisis', 'viewed_count': 3200659}, {'id': 628, 'hero': 'https://pe.tedcdn.com/images/ted/113028_800x600.jpg', 'speaker': 'James Balog', 'title': 'Time-lapse proof of extreme ice loss', 'duration': 1162, 'slug': 'james_balog_time_lapse_proof_of_extreme_ice_loss', 'viewed_count': 914178}, {'id': 1380, 'hero': 'https://pe.tedcdn.com/images/ted/3d552c9054e770953216a6b5aab1b748b3e5f823_2880x1620.jpg', 'speaker': 'James Hansen', 'title': 'Why I must speak out about climate change', 'duration': 1071, 'slug': 'james_hansen_why_i_must_speak_out_about_climate_change', 'viewed_count': 1243947}, {'id': 2816, 'hero': 'https://pe.tedcdn.com/images/ted/07284d43b7d248835825748dc24c63e4b83478c9_2880x1620.jpg', 'speaker': 'Kate Marvel', 'title': 'Can clouds buy us more time to solve climate change?', 'duration': 787, 'slug': 'kate_marvel_can_clouds_buy_us_more_time_to_solve_climate_change', 'viewed_count': 907550}, {'id': 192, 'hero': 'https://pe.tedcdn.com/images/ted/2412c9764c6b9bed2d9bd55ec7a6b9e540297839_1600x1200.jpg', 'speaker': 'David Keith', 'title': 'A critical look at geoengineering against climate change', 'duration': 958, 'slug': 'david_keith_s_surprising_ideas_on_climate_change', 'viewed_count': 876780}, {'id': 945, 'hero': 'https://pe.tedcdn.com/images/ted/195010_800x600.jpg', 'speaker': 'Johan Rockstrom', 'title': 'Let the environment guide our development', 'duration': 1090, 'slug': 'johan_rockstrom_let_the_environment_guide_our_development', 'viewed_count': 924840}]:   1  
##  [{'id': 10, 'hero': 'https://pe.tedcdn.com/images/ted/671d08b846a625bc2fdee1b6e3c7326bd9c3c48c_1600x1200.jpg', 'speaker': 'Dean Ornish', 'title': "The killer American diet that's sweeping the planet", 'duration': 198, 'slug': 'dean_ornish_on_the_world_s_killer_diet', 'viewed_count': 2299281}, {'id': 263, 'hero': 'https://pe.tedcdn.com/images/ted/e53dd444ed289f36275590ba3c93ddc161cf83fa_1600x1200.jpg', 'speaker': 'Mark Bittman', 'title': "What's wrong with what we eat", 'duration': 1208, 'slug': 'mark_bittman_on_what_s_wrong_with_what_we_eat', 'viewed_count': 3830694}, {'id': 348, 'hero': 'https://pe.tedcdn.com/images/ted/aaeea21df3e61b5ecb347f5b024e519b178547ef_1600x1200.jpg', 'speaker': 'Ann Cooper', 'title': "What's wrong with school lunches", 'duration': 1182, 'slug': 'ann_cooper_talks_school_lunches', 'viewed_count': 1137781}, {'id': 2659, 'hero': 'https://pe.tedcdn.com/images/ted/500c56050b598bbd5837bc0d378a5ba6301e71c0_2880x1620.jpg', 'speaker': 'Sam Kass', 'title': 'Want kids to learn well? Feed them well', 'duration': 728, 'slug': 'sam_kass_want_to_teach_kids_well_feed_them_well', 'viewed_count': 1345260}, {'id': 910, 'hero': 'https://pe.tedcdn.com/images/ted/181989_800x600.jpg', 'speaker': 'Ellen  Gustafson', 'title': 'Obesity + hunger = 1 global food issue', 'duration': 675, 'slug': 'ellen_gustafson_obesity_hunger_1_global_food_issue', 'viewed_count': 664485}, {'id': 1199, 'hero': 'https://pe.tedcdn.com/images/ted/e975f00df4827c6fbfc27ad3ffa820f1c609ce25_800x600.jpg', 'speaker': 'Josette Sheeran', 'title': 'Ending hunger now', 'duration': 1150, 'slug': 'josette_sheeran_ending_hunger_now', 'viewed_count': 812506}]                                                                 :   1  
##  [{'id': 1000, 'hero': 'https://pe.tedcdn.com/images/ted/51f652b9ff6854867d1d7abb2683caf1d8dd22fb_800x600.jpg', 'speaker': 'Gero Miesenboeck', 'title': 'Re-engineering the brain', 'duration': 1054, 'slug': 'gero_miesenboeck', 'viewed_count': 611083}, {'id': 724, 'hero': 'https://pe.tedcdn.com/images/ted/138921_800x600.jpg', 'speaker': 'Vilayanur Ramachandran', 'title': 'The neurons that shaped civilization', 'duration': 463, 'slug': 'vs_ramachandran_the_neurons_that_shaped_civilization', 'viewed_count': 1939394}, {'id': 659, 'hero': 'https://pe.tedcdn.com/images/ted/121608_800x600.jpg', 'speaker': 'Henry Markram', 'title': 'A brain in a supercomputer', 'duration': 890, 'slug': 'henry_markram_supercomputing_the_brain_s_secrets', 'viewed_count': 1180135}, {'id': 1450, 'hero': 'https://pe.tedcdn.com/images/ted/9d94c91442c28bfda8fc51070e63397867e1f787_800x600.jpg', 'speaker': 'Carl Schoonover', 'title': 'How to look inside the brain', 'duration': 292, 'slug': 'carl_schoonover_how_to_look_inside_the_brain', 'viewed_count': 890280}, {'id': 1349, 'hero': 'https://pe.tedcdn.com/images/ted/b6441a3ec7fe888555c728e011b6bd3650544fb8_800x600.jpg', 'speaker': 'Neil Burgess', 'title': 'How your brain tells you where you are', 'duration': 543, 'slug': 'neil_burgess_how_your_brain_tells_you_where_you_are', 'viewed_count': 1196431}, {'id': 2429, 'hero': 'https://pe.tedcdn.com/images/ted/0fc9a0c0e51919aa2f0efe825dbea9d4480baaf7_2880x1620.jpg', 'speaker': 'Jocelyne Bloch', 'title': 'The brain may be able to repair itself -- with help', 'duration': 694, 'slug': 'jocelyne_bloch_the_brain_may_be_able_to_repair_itself_with_help', 'viewed_count': 2152829}]                                                         :   1  
##  [{'id': 1000, 'hero': 'https://pe.tedcdn.com/images/ted/51f652b9ff6854867d1d7abb2683caf1d8dd22fb_800x600.jpg', 'speaker': 'Gero Miesenboeck', 'title': 'Re-engineering the brain', 'duration': 1054, 'slug': 'gero_miesenboeck', 'viewed_count': 611085}, {'id': 1674, 'hero': 'https://pe.tedcdn.com/images/ted/358f8320e4a3b2a9f21dc6ae0585b8ddc071c1a5_1600x1200.jpg', 'speaker': 'Michael Dickinson', 'title': 'How a fly flies', 'duration': 955, 'slug': 'michael_dickinson_how_a_fly_flies', 'viewed_count': 1555723}, {'id': 1879, 'hero': 'https://pe.tedcdn.com/images/ted/f8443ec450dd71d590ea1b0f7b9470dd5665b15b_2880x1620.jpg', 'speaker': 'Suzana Herculano-Houzel', 'title': 'What is so special about the human brain?', 'duration': 811, 'slug': 'suzana_herculano_houzel_what_is_so_special_about_the_human_brain', 'viewed_count': 2454843}, {'id': 967, 'hero': 'https://pe.tedcdn.com/images/ted/202580_800x600.jpg', 'speaker': 'Sebastian Seung', 'title': 'I am my connectome', 'duration': 1165, 'slug': 'sebastian_seung', 'viewed_count': 942938}, {'id': 2557, 'hero': 'https://pe.tedcdn.com/images/ted/eec0be6feefef0950540ec844664ed485ec83d8d_2880x1620.jpg', 'speaker': 'Ed Boyden', 'title': "A new way to study the brain's invisible secrets", 'duration': 795, 'slug': 'ed_boyden_baby_diapers_inspired_this_new_way_to_study_the_brain', 'viewed_count': 1260062}, {'id': 1714, 'hero': 'https://pe.tedcdn.com/images/ted/1733fa238431c2ae65c5410920b2413c2c4b8171_1600x1200.jpg', 'speaker': 'Thomas Insel', 'title': 'Toward a new understanding of mental illness', 'duration': 783, 'slug': 'thomas_insel_toward_a_new_understanding_of_mental_illness', 'viewed_count': 1188149}]                                                      :   1  
##  [{'id': 1001, 'hero': 'https://pe.tedcdn.com/images/ted/6f9f510c9443fa223d292d43a3e9999d8b1347cc_2880x1620.jpg', 'speaker': 'Andrew Bird', 'title': 'A one-man orchestra of the imagination', 'duration': 1159, 'slug': 'andrew_bird_s_one_man_orchestra_of_the_imagination', 'viewed_count': 1231425}, {'id': 286, 'hero': 'https://pe.tedcdn.com/images/ted/db471b06b2f5f6ba97c7de8b232878ffe9718600_1600x1200.jpg', 'speaker': 'Benjamin Zander', 'title': 'The transformative power of classical music', 'duration': 1243, 'slug': 'benjamin_zander_on_music_and_passion', 'viewed_count': 9315564}, {'id': 745, 'hero': 'https://pe.tedcdn.com/images/ted/143427_800x600.jpg', 'speaker': 'Sivamani', 'title': 'Rhythm is everything, everywhere', 'duration': 1000, 'slug': 'sivamani_rhythm_is_everything_everywhere', 'viewed_count': 556163}, {'id': 99, 'hero': 'https://pe.tedcdn.com/images/ted/217_480x360.jpg', 'speaker': 'Jill Sobule', 'title': 'Global warming\\'s theme song, "Manhattan in January"', 'duration': 163, 'slug': 'jill_sobule_sings_to_al_gore', 'viewed_count': 591395}, {'id': 688, 'hero': 'https://pe.tedcdn.com/images/ted/132434_800x600.jpg', 'speaker': 'Mallika Sarabhai', 'title': 'Dance to change the world', 'duration': 1012, 'slug': 'mallika_sarabhai', 'viewed_count': 481838}, {'id': 413, 'hero': 'https://pe.tedcdn.com/images/ted/61166_800x600.jpg', 'speaker': 'David Holt', 'title': 'The joyful tradition of mountain music', 'duration': 1517, 'slug': 'david_holt_plays_mountain_music', 'viewed_count': 512589}]                                                                                                                                                                                                     :   1  
##  (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            :2544  
##     speaker_occupation
##  Writer      :  45    
##  Artist      :  34    
##  Designer    :  34    
##  Journalist  :  33    
##  Entrepreneur:  31    
##  Architect   :  30    
##  (Other)     :2343    
##                                                                     tags     
##  ['art', 'creativity']                                                :   3  
##  ['entertainment', 'live music', 'music', 'performance']              :   3  
##  ['live music', 'music', 'performance']                               :   3  
##  ['architecture', 'cities', 'design', 'infrastructure']               :   2  
##  ['business', 'technology']                                           :   2  
##  ['charter for compassion', 'compassion', 'global issues', 'religion']:   2  
##  (Other)                                                              :2535  
##                                          title     
##   Hidden miracles of the natural world      :   1  
##   How CRISPR lets us edit our DNA           :   1  
##  "(Nothing But) Flowers" with string quartet:   1  
##  "Awoo"                                     :   1  
##  "Black Men Ski"                            :   1  
##  "Clonie"                                   :   1  
##  (Other)                                    :2544  
##                                                                                    url       
##  https://www.ted.com/talks/9_11_healing_the_mothers_who_found_forgiveness_friendship\n:   1  
##  https://www.ted.com/talks/a_choir_as_big_as_the_internet\n                           :   1  
##  https://www.ted.com/talks/a_j_jacobs_year_of_living_biblically\n                     :   1  
##  https://www.ted.com/talks/a_robot_that_flies_like_a_bird\n                           :   1  
##  https://www.ted.com/talks/a_ted_speaker_s_worst_nightmare\n                          :   1  
##  https://www.ted.com/talks/a_whistleblower_you_haven_t_heard\n                        :   1  
##  (Other)                                                                              :2544  
##      views         
##  Min.   :   50443  
##  1st Qu.:  755793  
##  Median : 1124524  
##  Mean   : 1698297  
##  3rd Qu.: 1700760  
##  Max.   :47227110  
## 
melted_ted <- melt(ted)
ggplot(data = melted_ted, mapping = aes(x = value)) + 
    geom_histogram(bins = 30) +  # 30 bins represented the distribution well from trying values between 10 and 100, and is also the minimum for normal distribution so it can show well if we'd have a normal distribution.
  # tried to force non-scientific notation, but the numbers are too long to represent, so I removed it.
   #scale_x_continuous(labels = function(x) format(x, scientific = FALSE)) +  
    facet_wrap(~variable, scales = 'free_x')

Converting Dates into variables

ted$date_pub = anydate(ted$published_date)
ted$month = month(ted$date_pub)
ted$year = year(ted$date_pub)
ted$day = weekdays(ted$date_pub,abbreviate = TRUE)
head(ted,3)
##   comments
## 1     4553
## 2      265
## 3      124
##                                                                                                                                                                                                                                 description
## 1                                                                                     Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.
## 2 With the same humor and humanity he exuded in "An Inconvenient Truth," Al Gore spells out 15 ways that individuals can address climate change immediately, from buying a hybrid to inventing a new, hotter brand name for global warming.
## 3                                New York Times columnist David Pogue takes aim at technology’s worst interface-design offenders, and provides encouraging examples of products that get it right. To funny things up, he bursts into song.
##   duration   event  film_date languages main_speaker
## 1     1164 TED2006 1140825600        60 Ken Robinson
## 2      977 TED2006 1140825600        43      Al Gore
## 3     1286 TED2006 1140739200        26  David Pogue
##                                        name num_speaker published_date
## 1 Ken Robinson: Do schools kill creativity?           1     1151367060
## 2      Al Gore: Averting the climate crisis           1     1151367060
## 3             David Pogue: Simplicity sells           1     1151367060
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ratings
## 1 [{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', 'count': 4573}, {'id': 9, 'name': 'Ingenious', 'count': 6073}, {'id': 3, 'name': 'Courageous', 'count': 3253}, {'id': 11, 'name': 'Longwinded', 'count': 387}, {'id': 2, 'name': 'Confusing', 'count': 242}, {'id': 8, 'name': 'Informative', 'count': 7346}, {'id': 22, 'name': 'Fascinating', 'count': 10581}, {'id': 21, 'name': 'Unconvincing', 'count': 300}, {'id': 24, 'name': 'Persuasive', 'count': 10704}, {'id': 23, 'name': 'Jaw-dropping', 'count': 4439}, {'id': 25, 'name': 'OK', 'count': 1174}, {'id': 26, 'name': 'Obnoxious', 'count': 209}, {'id': 10, 'name': 'Inspiring', 'count': 24924}]
## 2                  [{'id': 7, 'name': 'Funny', 'count': 544}, {'id': 3, 'name': 'Courageous', 'count': 139}, {'id': 2, 'name': 'Confusing', 'count': 62}, {'id': 1, 'name': 'Beautiful', 'count': 58}, {'id': 21, 'name': 'Unconvincing', 'count': 258}, {'id': 11, 'name': 'Longwinded', 'count': 113}, {'id': 8, 'name': 'Informative', 'count': 443}, {'id': 10, 'name': 'Inspiring', 'count': 413}, {'id': 22, 'name': 'Fascinating', 'count': 132}, {'id': 9, 'name': 'Ingenious', 'count': 56}, {'id': 24, 'name': 'Persuasive', 'count': 268}, {'id': 23, 'name': 'Jaw-dropping', 'count': 116}, {'id': 26, 'name': 'Obnoxious', 'count': 131}, {'id': 25, 'name': 'OK', 'count': 203}]
## 3                    [{'id': 7, 'name': 'Funny', 'count': 964}, {'id': 3, 'name': 'Courageous', 'count': 45}, {'id': 9, 'name': 'Ingenious', 'count': 183}, {'id': 1, 'name': 'Beautiful', 'count': 60}, {'id': 21, 'name': 'Unconvincing', 'count': 104}, {'id': 11, 'name': 'Longwinded', 'count': 78}, {'id': 8, 'name': 'Informative', 'count': 395}, {'id': 10, 'name': 'Inspiring', 'count': 230}, {'id': 22, 'name': 'Fascinating', 'count': 166}, {'id': 2, 'name': 'Confusing', 'count': 27}, {'id': 25, 'name': 'OK', 'count': 146}, {'id': 24, 'name': 'Persuasive', 'count': 230}, {'id': 23, 'name': 'Jaw-dropping', 'count': 54}, {'id': 26, 'name': 'Obnoxious', 'count': 142}]
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  related_talks
## 1                                                                        [{'id': 865, 'hero': 'https://pe.tedcdn.com/images/ted/172559_800x600.jpg', 'speaker': 'Ken Robinson', 'title': 'Bring on the learning revolution!', 'duration': 1008, 'slug': 'sir_ken_robinson_bring_on_the_revolution', 'viewed_count': 7266103}, {'id': 1738, 'hero': 'https://pe.tedcdn.com/images/ted/de98b161ad1434910ff4b56c89de71af04b8b873_1600x1200.jpg', 'speaker': 'Ken Robinson', 'title': "How to escape education's death valley", 'duration': 1151, 'slug': 'ken_robinson_how_to_escape_education_s_death_valley', 'viewed_count': 6657572}, {'id': 2276, 'hero': 'https://pe.tedcdn.com/images/ted/3821f3728e0b755c7b9aea2e69cc093eca41abe1_2880x1620.jpg', 'speaker': 'Linda Cliatt-Wayman', 'title': 'How to fix a broken school? Lead fearlessly, love hard', 'duration': 1027, 'slug': 'linda_cliatt_wayman_how_to_fix_a_broken_school_lead_fearlessly_love_hard', 'viewed_count': 1617101}, {'id': 892, 'hero': 'https://pe.tedcdn.com/images/ted/e79958940573cc610ccb583619a54866c41ef303_2880x1620.jpg', 'speaker': 'Charles Leadbeater', 'title': 'Education innovation in the slums', 'duration': 1138, 'slug': 'charles_leadbeater_on_education', 'viewed_count': 772296}, {'id': 1232, 'hero': 'https://pe.tedcdn.com/images/ted/0e3e4e92d5ee8ae0e43962d447d3f790b31099b8_800x600.jpg', 'speaker': 'Geoff Mulgan', 'title': 'A short intro to the Studio School', 'duration': 376, 'slug': 'geoff_mulgan_a_short_intro_to_the_studio_school', 'viewed_count': 667971}, {'id': 2616, 'hero': 'https://pe.tedcdn.com/images/ted/71cde5a6fa6c717488fb55eff9eef939a9241761_2880x1620.jpg', 'speaker': 'Kandice Sumner', 'title': "How America's public schools keep kids in poverty", 'duration': 830, 'slug': 'kandice_sumner_how_america_s_public_schools_keep_kids_in_poverty', 'viewed_count': 1181333}]
## 2 [{'id': 243, 'hero': 'https://pe.tedcdn.com/images/ted/566c14767bd62c5ff760e483c5b16cd2753328cd_2880x1620.jpg', 'speaker': 'Al Gore', 'title': 'New thinking on the climate crisis', 'duration': 1674, 'slug': 'al_gore_s_new_thinking_on_the_climate_crisis', 'viewed_count': 1751408}, {'id': 547, 'hero': 'https://pe.tedcdn.com/images/ted/89288_800x600.jpg', 'speaker': 'Ray Anderson', 'title': 'The business logic of sustainability', 'duration': 954, 'slug': 'ray_anderson_on_the_business_logic_of_sustainability', 'viewed_count': 881833}, {'id': 2093, 'hero': 'https://pe.tedcdn.com/images/ted/146d88845861cbf768bbf8bec8b2e41f8bfc7903_2400x1800.jpg', 'speaker': 'Lord Nicholas Stern', 'title': 'The state of the climate — and what we might do about it', 'duration': 993, 'slug': 'lord_nicholas_stern_the_state_of_the_climate_and_what_we_might_do_about_it', 'viewed_count': 773779}, {'id': 2784, 'hero': 'https://pe.tedcdn.com/images/ted/e835e670a7836cf65aca2a7a644fd94398cb4b8e_2880x1620.jpg', 'speaker': 'Ted Halstead', 'title': 'A climate solution where all sides can win', 'duration': 787, 'slug': 'ted_halstead_a_climate_solution_where_all_sides_can_win', 'viewed_count': 1021315}, {'id': 2339, 'hero': 'https://pe.tedcdn.com/images/ted/f6879e1c78d4a069735bf745ecfede71ccf4a352_2880x1620.jpg', 'speaker': 'Alice Bows-Larkin', 'title': "Climate change is happening. Here's how we adapt", 'duration': 863, 'slug': 'alice_bows_larkin_we_re_too_late_to_prevent_climate_change_here_s_how_we_adapt', 'viewed_count': 1157975}, {'id': 2331, 'hero': 'https://pe.tedcdn.com/images/ted/64bbcc24af870d9b4e1768abe88e1fe041b02a6a_2880x1620.jpg', 'speaker': 'Mary Robinson', 'title': 'Why climate change is a threat to human rights', 'duration': 1307, 'slug': 'mary_robinson_why_climate_change_is_a_threat_to_human_rights', 'viewed_count': 1126293}]
## 3                                                                                                                                                                  [{'id': 1725, 'hero': 'https://pe.tedcdn.com/images/ted/b7f415a054cc0a2bfdd90d0ad5a7f64cf060150d_1600x1200.jpg', 'speaker': 'David Pogue', 'title': '10 top time-saving tech tips', 'duration': 344, 'slug': 'david_pogue_10_top_time_saving_tech_tips', 'viewed_count': 4843421}, {'id': 2274, 'hero': 'https://pe.tedcdn.com/images/ted/608e677e4392bcdcf82b068fa221b9df74a213ef_2880x1620.jpg', 'speaker': 'Tony Fadell', 'title': 'The first secret of design is ... noticing', 'duration': 1001, 'slug': 'tony_fadell_the_first_secret_of_design_is_noticing', 'viewed_count': 2005916}, {'id': 172, 'hero': 'https://pe.tedcdn.com/images/ted/b790be2f87ceffba73fe73837944400c7d61cba2_1600x1200.jpg', 'speaker': 'John Maeda', 'title': 'Designing for simplicity', 'duration': 959, 'slug': 'john_maeda_on_the_simple_life', 'viewed_count': 1215942}, {'id': 2664, 'hero': 'https://pe.tedcdn.com/images/ted/092f184f6625c2aeef10949c8d7b2aa14ba4132b_2880x1620.jpg', 'speaker': 'Dan Bricklin', 'title': 'Meet the inventor of the electronic spreadsheet', 'duration': 720, 'slug': 'dan_bricklin_meet_the_inventor_of_the_electronic_spreadsheet', 'viewed_count': 992322}, {'id': 436, 'hero': 'https://pe.tedcdn.com/images/ted/66438_800x600.jpg', 'speaker': 'David Carson', 'title': 'Design and discovery', 'duration': 1359, 'slug': 'david_carson_on_design', 'viewed_count': 774485}, {'id': 1546, 'hero': 'https://pe.tedcdn.com/images/ted/39d58abd56cf9edd6936f4f42119a15e65692dc1_1600x1200.jpg', 'speaker': 'Clay Shirky', 'title': 'How the Internet will (one day) transform government', 'duration': 1112, 'slug': 'clay_shirky_how_the_internet_will_one_day_transform_government', 'viewed_count': 1245084}]
##     speaker_occupation
## 1      Author/educator
## 2     Climate advocate
## 3 Technology columnist
##                                                                                                                                     tags
## 1                                                   ['children', 'creativity', 'culture', 'dance', 'education', 'parenting', 'teaching']
## 2 ['alternative energy', 'cars', 'climate change', 'culture', 'environment', 'global issues', 'science', 'sustainability', 'technology']
## 3            ['computers', 'entertainment', 'interface design', 'media', 'music', 'performance', 'simplicity', 'software', 'technology']
##                         title
## 1 Do schools kill creativity?
## 2 Averting the climate crisis
## 3            Simplicity sells
##                                                                     url
## 1 https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity\n
## 2        https://www.ted.com/talks/al_gore_on_averting_climate_crisis\n
## 3         https://www.ted.com/talks/david_pogue_says_simplicity_sells\n
##      views   date_pub month year day
## 1 47227110 2006-06-27     6 2006 Tue
## 2  3200520 2006-06-27     6 2006 Tue
## 3  1636292 2006-06-27     6 2006 Tue

Converting tags variable to number of tags

typeof(ted$tags[1])
## [1] "integer"
split(as.character(ted$tags[1]),", ")
## $`, `
## [1] "['children', 'creativity', 'culture', 'dance', 'education', 'parenting', 'teaching']"
#splitting did not work well. I'll just use the count of how many tags for now
library(stringr)
ted$num_tags <- str_count(ted$tags, ",") +1 #counting how many tags by counting the number of commas and adding one. I verified it worked well.
# quick histogram of the number of tags
qplot(x=num_tags,data=ted)

Most frequent numbers of tages are around 4-7 tags. Since the distribution is skewed, regression might work best over a polynomial or factored (cateogircal) representation of this. I’ll inspect this later

Distribution of the duration of a talk

median_duration <- median(ted$duration)
cat("Median number of duration: ", median(ted$duration))
## Median number of duration:  848
print("")
## [1] ""
cat("Mean number of duration: ", mean(ted$duration))
## Mean number of duration:  826.5102
# simple r histogram:
# hist(ted$duration)
# a nicer histogram using ggplot, also adding median number of duration line
duration_hist = ggplot(ted,aes(duration,..count..)) + 
  geom_histogram(fill="turquoise") +
  labs(x="Duration",y="How Many Talks in that duration",title="Histogram (Distribution) of Durations of TedTalks") + 
  #scale_x_continuous(limits=c(0,1500),breaks=seq(0,1500,150)) + 
  geom_vline(aes(xintercept = median(ted$duration)),linetype=4,size=1,color="white") +
  geom_vline(aes(xintercept = mean(ted$duration)),linetype=4,size=1,color="blue")
duration_hist

Distriubtion of number of views

### Distribution of the number of views (in the discussion) per talk
median_views <- median(ted$views)
cat("Median number of views: ", median(ted$views))
## Median number of views:  1124524
cat("Mean number of views: ", mean(ted$views))
## Mean number of views:  1698297
# simple r histogram:
hist(ted$views)

# a nicer histogram using ggplot, also adding median number of views line
views_hist = ggplot(ted,aes(views,..count..)) + 
  geom_histogram(fill="tomato") +
  labs(x="views",y="Count",title="Histogram (Distribution) of Number Of views", scientific = FALSE) +
  scale_x_continuous(limits=c(0,10000000)) + #,breaks=seq(0,10000000,10000),labels = function(x) format(x, scientific = FALSE) ) + 
  #theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  geom_vline(aes(xintercept = median(ted$views)),linetype=4,size=1,color="white") +
  geom_vline(aes(xintercept = mean(ted$views)),linetype=4,size=1,color="blue")

views_hist

Distribution of the number of comments (in the discussion) per talk

median_comments <- median(ted$comments)
cat("Median number of comments: ", median(ted$comments))
## Median number of comments:  118
cat("Mean number of comments: ", mean(ted$comments))
## Mean number of comments:  191.5624
# simple r histogram:
hist(ted$comments)

# a nicer histogram using ggplot, also adding median number of comments line
comments_hist = ggplot(ted,aes(comments,..count..)) + 
  geom_histogram(fill="navy") +
  labs(x="Comments",y="Count",title="Histogram (Distribution) of Number Of Comments") + 
  #scale_x_continuous(limits=c(0,1500),breaks=seq(0,1500,150)) + 
  geom_vline(aes(xintercept = median(ted$comments)),linetype=4,size=1,color="white") +
  geom_vline(aes(xintercept = mean(ted$comments)),linetype=4,size=1,color="blue")

comments_hist

The number of comments are strongly skewed left towards 0 comments. Therefore, their mean (191) is not the most representative, but median is more representative: 118.

Let’s explore occupations. Let’s first see 10 most popular occuptions

occupation_df <- data.frame(table(ted$speaker_occupation))
colnames(occupation_df) <- c("occupation", "appearances")
occupation_df <- occupation_df %>% arrange(desc(appearances))
head(occupation_df, 10)
##      occupation appearances
## 1        Writer          45
## 2        Artist          34
## 3      Designer          34
## 4    Journalist          33
## 5  Entrepreneur          31
## 6     Architect          30
## 7      Inventor          27
## 8  Psychologist          26
## 9  Photographer          25
## 10    Filmmaker          21
ggplot(head(occupation_df,30), 
       aes(x=reorder(occupation, appearances), 
           y=appearances, fill=occupation)) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  geom_bar(stat="identity") + 
  guides(fill=FALSE)

ted_occupation_summary = ted %>% group_by(speaker_occupation) %>% summarise(meanviews=mean(views)) %>% arrange(desc(meanviews)) 


ggplot(ted_occupation_summary,
  aes(factor(speaker_occupation,levels=speaker_occupation), meanviews,fill=meanviews))+
  geom_bar(stat="identity")+
  labs(x="Occupation" ,y="Mean Views",title="Occupation Vs Views")

  #scale_fill_brewer(name="Occupation",palette = "Set2")#+
  #scale_y_discrete(labels=scales::comma)

Days of week Published vs Views

# group by day and count total views, using dpyler
ted_by_day = ted %>% group_by(day) %>% summarise(mean_views=mean(views)) %>% arrange(desc(mean_views)) 
ted_by_day$day <- factor(ted_by_day$day, levels = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"))

ggplot(data = ted_by_day, aes(x = factor(day), y = mean_views, fill = mean_views)) +
  geom_bar(stat="identity") +
  labs(x="Day of week published",y="Mean Views",title="Mean Views Per Day of Week") 

So day of week does seem to have some association with average (mean) views! Ted Talks published on weekends seem to have much less views, with Saturday being the lowest, and Friday is the most popular day for ted talks published day. Since it seems that the major effect is “Weekend or not”, I’m creating a binary dummy variable for is it weekend or not to regress upon later.

# creating a is_weekend variable
library(chron)
ted$weekend <- as.numeric(is.weekend(ted$date_pub))

Insights and description from the distributions: Number of views is also a Poisson like distribution skewed leftwards Duration of talks is closer to a normal distribution, but with a wide right-side tail of a few talks at longer durations, around a mean of 14 minutes and median of 12 minutes. Almost all talks range between 1-18 minutes (maximum length of a normal Ted talk). Number of comments is a Poisson distribution (visually resembling an exponential distribution, but it is not technically since comments are measured at discrete numbers) strongly skewed to the minimum of 0 comments for the unpopular videos. Number of tags also is a Poisson distribution skewed leftwards, peaks between 4-7 tags. Number of languages of translations has a peak at 0 for unpopular talks, but mostly is between 20-40 languages offered. While the film dates are low before 2012, they are all published at a much more uniform rate.

2 - Correlations between parameteres

First is a correlation (pairs) scatterplot matrix between each pair of numerical variables; Below is a correlation matrix with colors representing the intensity of the correlation from 0 (white) to dark blue (+1) or dark striped red (strong negative correlation, -1), and with asterisks (***) signifying significance by p-values.

Correlations

colnames(ted)
##  [1] "comments"           "description"        "duration"          
##  [4] "event"              "film_date"          "languages"         
##  [7] "main_speaker"       "name"               "num_speaker"       
## [10] "published_date"     "ratings"            "related_talks"     
## [13] "speaker_occupation" "tags"               "title"             
## [16] "url"                "views"              "date_pub"          
## [19] "month"              "year"               "day"               
## [22] "num_tags"           "weekend"
col_numeric =  c(17,1,3,6,10,23) # comments,duration,film_date,languages,views)
ted_numeric = ted[,col_numeric]
pairs(ted_numeric)

We see some correlations there but not many clear ones; between views, comments and languages. No clear corerlation between views and duration or published date.

cor_matrix <- cor(ted_numeric)
res <- cor.mtest(ted_numeric, conf.level = .95)
res
## $p
##               [,1]          [,2]         [,3]         [,4]         [,5]
## [1,]  0.000000e+00 1.803322e-185 1.383469e-02 3.140578e-87 3.657160e-01
## [2,] 1.803322e-185  0.000000e+00 9.558285e-13 3.950633e-61 2.869772e-21
## [3,]  1.383469e-02  9.558285e-13 0.000000e+00 1.279484e-52 2.819913e-17
## [4,]  3.140578e-87  3.950633e-61 1.279484e-52 0.000000e+00 2.370426e-18
## [5,]  3.657160e-01  2.869772e-21 2.819913e-17 2.370426e-18 0.000000e+00
## [6,]  5.820279e-04  6.932328e-01 1.003110e-01 8.566019e-35 2.459542e-05
##              [,6]
## [1,] 5.820279e-04
## [2,] 6.932328e-01
## [3,] 1.003110e-01
## [4,] 8.566019e-35
## [5,] 2.459542e-05
## [6,] 0.000000e+00
## 
## $lowCI
##              [,1]        [,2]         [,3]       [,4]        [,5]
## [1,]  1.000000000  0.50247793  0.009942831  0.3438467 -0.05669662
## [2,]  0.502477926  1.00000000  0.102436668  0.2829638 -0.22314183
## [3,]  0.009942831  0.10243667  1.000000000 -0.3307018 -0.20382475
## [4,]  0.343846739  0.28296377 -0.330701766  1.0000000 -0.20925668
## [5,] -0.056696624 -0.22314183 -0.203824754 -0.2092567  1.00000000
## [6,] -0.106608323 -0.04661768 -0.006273864 -0.2764461 -0.12185536
##              [,6]
## [1,] -0.106608323
## [2,] -0.046617685
## [3,] -0.006273864
## [4,] -0.276446094
## [5,] -0.121855358
## [6,]  1.000000000
## 
## $uppCI
##             [,1]       [,2]        [,3]       [,4]        [,5]        [,6]
## [1,]  1.00000000  0.5582501  0.08739150  0.4104235  0.02091130 -0.02933472
## [2,]  0.55825006  1.0000000  0.17853504  0.3527427 -0.14818904  0.03101040
## [3,]  0.08739150  0.1785350  1.00000000 -0.2598468 -0.12833640  0.07127682
## [4,]  0.41042349  0.3527427 -0.25984683  1.0000000 -0.13391282 -0.20328629
## [5,]  0.02091130 -0.1481890 -0.12833640 -0.1339128  1.00000000 -0.04476215
## [6,] -0.02933472  0.0310104  0.07127682 -0.2032863 -0.04476215  1.00000000
corrplot::corrplot(cor_matrix,method="shade",bg="white",title="Correlation Matrix (by color) and significane (*)", 
                   p.mat = res$p,  sig.level = c(.001, .01, .05), pch.cex = .9, insig = "label_sig", pch.col = "white")

These correlation plots and correlation matrix show us the following conclusions: * There is a relatively higher positive correlation between number of comments and views, which makes sense (more audience, more comments); * Some positive correlation between number of languages of translation and number of views (0.38) and number of comments (0.32) * Small negative correlation between duration and number of languages; the shorter the talk, the more tranlsated languages there are, probably because it is easier to translate.

Most of the parameters don’t have strong correlations. Naturally, there was a very high correlation between published data and filmed date. Filmed date seemed to be less associated with views numerically and logically – since the audience is more affected by the date a ted talk is released than whether it was recorded a month ago or a year ago. There is a relatively higher positive correlation between number of comments and views, which makes sense (more audience, more comments); Some positive correlation between number of languages of translation and number of views (0.38) and number of comments (0.32) Small negative correlation between duration and number of languages; the shorter the talk, the more translated languages there are, probably because it is easier to translate.

3 - Correlation Of Paramteres with Numbers of Views

How do these variables correlate with views? Is it a linear relationship, nonlinear, or non? This is important to understand to know how to insert them into the regression if at all.

Here I’ll start inspecting what are the correlations between vriables and the dependent variable, and what kind of relationship would suit to include in the regression between them (linear? polynomial? factor? none?)

# Views By Duraion 
ggplot(ted, aes(duration, views, size = views, col = duration)) + 
  geom_point(alpha=0.8) + 
  geom_smooth(method = loess, colour="White") + 
  geom_smooth(method = lm, colour="Pink") + 
  labs(x="Duration",y="Views",title="Views By Duration") #+

# Views By Comments
ggplot(ted, aes(comments, views, size = views, col = comments)) + 
  geom_point(alpha=0.8) + 
  geom_smooth(method = loess, colour="White") + 
  geom_smooth(method = lm, colour="Pink") + 
  labs(x="Comments",y="Views",title="Views By Comments") #+

# Views By Languages 
ggplot(ted, aes(languages, views, size = views, col = languages)) + 
  geom_point(alpha=0.8) + 
  geom_smooth(method = loess, colour="White") + 
  geom_smooth(method = lm, colour="Pink") + 
  labs(x="Languages",y="Views",title="Views By Languages") 

# Views By Number of Tags 
ggplot(ted, aes(num_tags, views, size = views, col = num_tags)) + 
  geom_point(alpha=0.8) + 
  geom_smooth(method = loess, colour="White") + 
  geom_smooth(method = lm, colour="Pink") +  
  labs(x="Languages",y="Views",title="Views By Number of Tags") 

It seems that non of these loose much information by a linear regression versus a LOESS regression, which is arbitrarily flexible and would reveal a clear non-linear shape. while some of them do have nonlinear shapes - from a closer look, it is only in the tail where data is scarce and it is biasd by the few datapoints there and some outliers (as in the Comments correlation). Therefore, inputting the regressor as a linear fit might be sufficiently explanatory.

So, day of week does seem to have some association with average (mean) views! Ted Talks published on weekends seem to have much less views, with Saturday being the lowest, and Friday is the most popular day for ted talks published day. Below are scatterplots with LOESS flexible regression in white and linear regression in pink, to see how different would a linear shape look from a flexible moving average. This shows us that usually, except for in the tales of the distributions of these variables, where there are only a couple of outliers’ data, the linear model described the relationship somewhat well.

Surprisingly, duration had almost no consistent correlation with number of views; except for the fact that most popular talks were closer to 8-20 minutes. Number of comments is, obviously, very well correlated with number of views and so does number of languages – they all come from having many viewers. Thus, it is not “fair” to predict views based on these factors, and in the real world, we couldn’t use these parameters to predict, since they are not causes for more views, but they are also a result of many views, and a cause in a reinforcing feedback loop: the more comments, the more engaged the community is around the talk and likelier to spread; the more languages, the more viewers can watch; and the more viewers, the more audience there is to comment and translate. The rest had a small linear effect, where that didn’t deviate much, though small.

Regression Models

I inspected how does each of hte relevant variables explain the view count individually, and how does then adding (the relevant) one to a multiple regression improves its performance.

fit1 <- lm(views  ~ comments, data = ted)
summary(fit1)
## 
## Call:
## lm(formula = views ~ comments, data = ted)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -26514433   -667711   -254429    229873  31596994 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 798186.6    50681.6   15.75   <2e-16 ***
## comments      4698.8      148.6   31.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2118000 on 2548 degrees of freedom
## Multiple R-squared:  0.2819, Adjusted R-squared:  0.2816 
## F-statistic:  1000 on 1 and 2548 DF,  p-value: < 2.2e-16
fit2 <- lm(views  ~ languages, data = ted)
summary(fit2)
## 
## Call:
## lm(formula = views ~ languages, data = ted)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4235090  -887422  -418475   287948 42305382 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -997579     138744   -7.19 8.47e-13 ***
## languages      98655       4792   20.59  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2314000 on 2548 degrees of freedom
## Multiple R-squared:  0.1426, Adjusted R-squared:  0.1423 
## F-statistic: 423.8 on 1 and 2548 DF,  p-value: < 2.2e-16
fit3 <- lm(views  ~ duration, data = ted)
summary(fit3)
## 
## Call:
## lm(formula = views ~ duration, data = ted)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2667314  -932675  -554748    28483 45418926 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1429186.7   119912.2  11.919   <2e-16 ***
## duration        325.6      132.2   2.463   0.0138 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2496000 on 2548 degrees of freedom
## Multiple R-squared:  0.002376,   Adjusted R-squared:  0.001984 
## F-statistic: 6.068 on 1 and 2548 DF,  p-value: 0.01383
# Adding date related information using just a weekday binary
fit4 <- lm(views ~ ted$weekend, data=ted)
summary(fit4)
## 
## Call:
## lm(formula = views ~ ted$weekend, data = ted)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1683960  -934543  -576696     5606 45492707 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1734403      50473  34.363  < 2e-16 ***
## ted$weekend  -836986     243014  -3.444 0.000582 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2493000 on 2548 degrees of freedom
## Multiple R-squared:  0.004634,   Adjusted R-squared:  0.004243 
## F-statistic: 11.86 on 1 and 2548 DF,  p-value: 0.000582
# weekend was significant, with a large slope (-836986) but didn't explain data well alone.

fit5 <- lm(views  ~ num_tags, data = ted)
summary(fit5)
## 
## Call:
## lm(formula = views ~ num_tags, data = ted)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1710677  -959176  -550980    24372 45516020 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1886196      99359   18.98   <2e-16 ***
## num_tags      -25015      11474   -2.18   0.0293 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2497000 on 2548 degrees of freedom
## Multiple R-squared:  0.001862,   Adjusted R-squared:  0.00147 
## F-statistic: 4.753 on 1 and 2548 DF,  p-value: 0.02933

The comments and languages where significantly and (relatively highly) positively correlated with views. Day of week seeemed to have some correlation - weekend days reduced the views. I therefore just regress on a binary variable weekend or weekday, since the nunmber of the day of the week didn’t have a clear correlation.

Excluded Variables

# Excluded anlaysis

summary(lm(views  ~ event, data = ted))
## 
## Call:
## lm(formula = views ~ event, data = ted)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -15027899   -850943   -300075     86910  43952765 
## 
## Coefficients:
##                                           Estimate Std. Error t value
## (Intercept)                                 149818    2447735   0.061
## eventArbejdsglaede Live                     821776    3461619   0.237
## eventBBC TV                                 372156    3461619   0.108
## eventBowery Poetry Club                     526923    3461619   0.152
## eventBusiness Innovation Factory            154268    2826400   0.055
## eventCarnegie Mellon University             414963    3461619   0.120
## eventChautauqua Institution                 111051    2826400   0.039
## eventDICE Summit 2010                       299343    3461619   0.086
## eventDLD 2007                               613997    3461619   0.177
## eventEG 2007                                887449    2540134   0.349
## eventEG 2008                               1898441    2596215   0.731
## eventElizabeth G. Anderson School           876771    3461619   0.253
## eventEric Whitacre's Virtual Choir          249462    3461619   0.072
## eventFort Worth City Council                128854    3461619   0.037
## eventFull Spectrum Auditions               1022540    2826400   0.362
## eventGel Conference                         506574    2997850   0.169
## eventGlobal Witness HQ                      865865    3461619   0.250
## eventHandheld Learning                      101495    3461619   0.029
## eventHarvard University                    1256333    3461619   0.363
## eventINK Conference                        1493142    2643856   0.565
## eventJustice with Michael Sandel            243641    3461619   0.070
## eventLIFT 2007                             1337035    3461619   0.386
## eventMichael Howard Studios                  27177    3461619   0.008
## eventMission Blue II                       1179654    2596215   0.454
## eventMission Blue Voyage                    385465    2514808   0.153
## eventNew York State Senate                  142577    3461619   0.041
## eventNextGen:Charity                        143808    3461619   0.042
## eventPrinceton University                   509846    3461619   0.147
## eventRoyal Institution                      168605    3461619   0.049
## eventRSA Animate                            865947    2826400   0.306
## eventSerious Play 2008                      776122    2596215   0.299
## eventSkoll World Forum 2007                 268416    3461619   0.078
## eventSoulPancake                            678385    3461619   0.196
## eventStanford University                   4508818    2997850   1.504
## eventTaste3 2008                            887956    2643856   0.336
## eventTED Dialogues                         1052566    2997850   0.351
## eventTED Fellows 2015                      1146696    3461619   0.331
## eventTED Fellows Retreat 2013               741501    2681359   0.277
## eventTED Fellows Retreat 2015              1608955    2616738   0.615
## eventTED in the Field                       506328    2736650   0.185
## eventTED Prize Wish                         271394    3461619   0.078
## eventTED Residency                          817780    2681359   0.305
## eventTED Senior Fellows at TEDGlobal 2010   167309    3461619   0.048
## eventTED Studio                            1261637    2736650   0.461
## eventTED Talks Education                   4225286    2596215   1.627
## eventTED Talks Live                        1346068    2508182   0.537
## eventTED-Ed                                 233905    2580139   0.091
## eventTED-Ed Weekend                         908124    3461619   0.262
## eventTED@Bangalore                          965404    3461619   0.279
## eventTED@BCG Berlin                        1676768    2997850   0.559
## eventTED@BCG London                        2279763    2736650   0.833
## eventTED@BCG Paris                         1319829    2556575   0.516
## eventTED@BCG San Francisco                 1661422    2616738   0.635
## eventTED@BCG Singapore                     1120376    2826400   0.396
## eventTED@Cannes                            1249062    2681359   0.466
## eventTED@IBM                               1427470    2567206   0.556
## eventTED@Intel                              771995    2826400   0.273
## eventTED@Johannesburg                      1444677    3461619   0.417
## eventTED@London                             973968    3461619   0.281
## eventTED@MotorCity                          567778    2997850   0.189
## eventTED@Nairobi                            777130    3461619   0.224
## eventTED@New York                          1499272    2736650   0.548
## eventTED@NYC                               1577687    2567206   0.615
## eventTED@State                              777748    2681359   0.290
## eventTED@State Street Boston               1222452    2643856   0.462
## eventTED@State Street London               2497339    3461619   0.721
## eventTED@StateStreet Boston                1527552    3461619   0.441
## eventTED@SXSWi                              505588    3461619   0.146
## eventTED@Unilever                          1643349    2826400   0.581
## eventTED@UPS                               1648480    2681359   0.615
## eventTED1984                                824269    3461619   0.238
## eventTED1990                                470988    3461619   0.136
## eventTED1994                                431601    3461619   0.125
## eventTED1998                                601068    2643856   0.227
## eventTED2001                               1709131    2826400   0.605
## eventTED2002                                801657    2491061   0.322
## eventTED2003                                961384    2483470   0.387
## eventTED2004                               2543826    2486901   1.023
## eventTED2005                               1636369    2480592   0.660
## eventTED2006                               3124527    2474782   1.263
## eventTED2007                               1361313    2465667   0.552
## eventTED2008                               1888829    2469113   0.765
## eventTED2009                               1605078    2462436   0.652
## eventTED2010                               1648253    2465667   0.668
## eventTED2011                               1818046    2465156   0.737
## eventTED2012                               2073222    2466491   0.841
## eventTED2013                               2152882    2463578   0.874
## eventTED2014                               1923056    2462261   0.781
## eventTED2015                               1861199    2463999   0.755
## eventTED2016                               1662804    2463578   0.675
## eventTED2017                               1038440    2465934   0.421
## eventTEDActive 2011                        1106268    2826400   0.391
## eventTEDActive 2014                        1317743    2997850   0.440
## eventTEDActive 2015                        1381244    3461619   0.399
## eventTEDCity2.0                             967738    2556575   0.379
## eventTEDGlobal 2005                        1489245    2494362   0.597
## eventTEDGlobal 2007                         412332    2492651   0.165
## eventTEDGlobal 2009                        1529203    2466491   0.620
## eventTEDGlobal 2010                        1189092    2469886   0.481
## eventTEDGlobal 2011                        1567943    2465667   0.636
## eventTEDGlobal 2012                        1922618    2465156   0.780
## eventTEDGlobal 2013                        2434345    2466208   0.987
## eventTEDGlobal 2014                        1166348    2471616   0.472
## eventTEDGlobal 2017                         406263    2826400   0.144
## eventTEDGlobal>Geneva                      3235590    2556575   1.266
## eventTEDGlobal>London                      1914689    2540134   0.754
## eventTEDGlobalLondon                       2390457    2596215   0.921
## eventTEDIndia 2009                         1209267    2482456   0.487
## eventTEDLagos Ideas Search                 1098726    2997850   0.367
## eventTEDMED 2009                           2096326    2556575   0.820
## eventTEDMED 2010                            326495    2736650   0.119
## eventTEDMED 2011                            622232    2616738   0.238
## eventTEDMED 2012                           1202115    2567206   0.468
## eventTEDMED 2013                           1532082    2596215   0.590
## eventTEDMED 2014                           1594178    2567206   0.621
## eventTEDMED 2015                           1903529    2596215   0.733
## eventTEDMED 2016                           1091313    2567206   0.425
## eventTEDNairobi Ideas Search                547029    3461619   0.158
## eventTEDNYC                                1092920    2511323   0.435
## eventTEDPrize@UN                            643944    2997850   0.215
## eventTEDSalon 2006                          947072    2736650   0.346
## eventTEDSalon 2007 Hot Science              737998    2997850   0.246
## eventTEDSalon 2009 Compassion               131998    2826400   0.047
## eventTEDSalon Berlin 2014                  1554534    2567206   0.606
## eventTEDSalon London 2009                   806357    3461619   0.233
## eventTEDSalon London 2010                  1235348    2616738   0.472
## eventTEDSalon London Fall 2011             2079182    2997850   0.694
## eventTEDSalon London Fall 2012             2224279    2643856   0.841
## eventTEDSalon London Spring 2011            902113    2643856   0.341
## eventTEDSalon London Spring 2012            937842    2681359   0.350
## eventTEDSalon NY2011                       1376771    2643856   0.521
## eventTEDSalon NY2012                        802688    2736650   0.293
## eventTEDSalon NY2013                       2711822    2556575   1.061
## eventTEDSalon NY2014                       1935574    2556575   0.757
## eventTEDSalon NY2015                       1561231    3461619   0.451
## eventTEDSummit                             1380145    2483470   0.556
## eventTEDWomen 2010                         1148268    2483470   0.462
## eventTEDWomen 2013                         2355045    2580139   0.913
## eventTEDWomen 2015                         1300084    2491061   0.522
## eventTEDWomen 2016                         1190539    2496209   0.477
## eventTEDxABQ                                148958    3461619   0.043
## eventTEDxAmazonia                           835155    3461619   0.241
## eventTEDxAmericanRiviera                   4234679    3461619   1.223
## eventTEDxAmoskeagMillyard                   910036    3461619   0.263
## eventTEDxAmsterdam                          680676    2736650   0.249
## eventTEDxArendal                           1915378    3461619   0.553
## eventTEDxAsheville                          268833    3461619   0.078
## eventTEDxAthens                            1654373    2997850   0.552
## eventTEDxAtlanta                            518636    3461619   0.150
## eventTEDxAustin                            1028499    2736650   0.376
## eventTEDxBeaconStreet                      2057677    2502747   0.822
## eventTEDxBeirut                            1083286    3461619   0.313
## eventTEDxBend                              2831865    2997850   0.945
## eventTEDxBerkeley                           616554    2997850   0.206
## eventTEDxBerlin                             866083    2826400   0.306
## eventTEDxBG                                 452385    3461619   0.131
## eventTEDxBinghamtonUniversity              4190299    3461619   1.211
## eventTEDxBloomington                       9334442    2997850   3.114
## eventTEDxBoston                            1036808    2826400   0.367
## eventTEDxBoston 2009                        347009    3461619   0.100
## eventTEDxBoston 2010                        666201    3461619   0.192
## eventTEDxBoston 2011                        512720    2643856   0.194
## eventTEDxBoston 2012                        772035    2643856   0.292
## eventTEDxBoulder                           1649480    2826400   0.584
## eventTEDxBoulder 2011                       931911    2997850   0.311
## eventTEDxBratislava                        1541938    2997850   0.514
## eventTEDxBrighton                           727288    3461619   0.210
## eventTEDxBrussels                           992203    2596215   0.382
## eventTEDxCaFoscariU                        1037287    3461619   0.300
## eventTEDxCaltech                           1099048    2596215   0.423
## eventTEDxCambridge                         1458830    2596215   0.562
## eventTEDxCanberra                           453356    2826400   0.160
## eventTEDxCannes                             612223    3461619   0.177
## eventTEDxCERN                              1186139    2616738   0.453
## eventTEDxChange                             882572    2736650   0.323
## eventTEDxChapmanU                          2948658    3461619   0.852
## eventTEDxCHUV                              4532640    3461619   1.309
## eventTEDxClaremontColleges                 1195626    3461619   0.345
## eventTEDxCMU                               1323306    2997850   0.441
## eventTEDxColbyCollege                      1273819    3461619   0.368
## eventTEDxColoradoSprings                    876199    3461619   0.253
## eventTEDxColumbus                           978040    2997850   0.326
## eventTEDxColumbusWomen                     1882933    3461619   0.544
## eventTEDxConcorde                          1044078    3461619   0.302
## eventTEDxConcordiaUPortland                3120541    3461619   0.901
## eventTEDxCreativeCoast                     8295163    3461619   2.396
## eventTEDxCrenshaw                           594975    3461619   0.172
## eventTEDxDanubia                           1387572    3461619   0.401
## eventTEDxDeExtinction                       601616    2997850   0.201
## eventTEDxDelft                              769486    2997850   0.257
## eventTEDxDesMoines                         1130007    3461619   0.326
## eventTEDxDirigo                              41737    3461619   0.012
## eventTEDxDU 2010                            750288    2997850   0.250
## eventTEDxDU 2011                            219545    3461619   0.063
## eventTEDxDubai                             1474923    3461619   0.426
## eventTEDxDublin                             387551    2826400   0.137
## eventTEDxEast                               579030    2681359   0.216
## eventTEDxEastEnd                           1661284    3461619   0.480
## eventTEDxEdmonton                          1429215    3461619   0.413
## eventTEDxEQChCh                            2679666    3461619   0.774
## eventTEDxEuston                            1168636    3461619   0.338
## eventTEDxExeter                            1088551    2580139   0.422
## eventTEDxFiDiWomen                         1475958    3461619   0.426
## eventTEDxFrankfurt                          758848    3461619   0.219
## eventTEDxFulbrightDublin                    902834    3461619   0.261
## eventTEDxGatewayWomen                      1278265    3461619   0.369
## eventTEDxGeorgetown                         510751    3461619   0.148
## eventTEDxGhent                              696160    2997850   0.232
## eventTEDxGlasgow                            892466    3461619   0.258
## eventTEDxGoldenGatePark 2012               4661362    3461619   1.347
## eventTEDxGoodenoughCollege                  329060    3461619   0.095
## eventTEDxGöteborg 2010                      313548    3461619   0.091
## eventTEDxGrandRapids                       1316277    3461619   0.380
## eventTEDxGreatPacificGarbagePatch           295753    2826400   0.105
## eventTEDxGroningen                         1303424    3461619   0.377
## eventTEDxHamburg                            389389    3461619   0.112
## eventTEDxHampshireCollege                   729858    3461619   0.211
## eventTEDxHelvetia                          1045120    3461619   0.302
## eventTEDxHogeschoolUtrecht                  488429    3461619   0.141
## eventTEDxHousesOfParliament                 746927    2643856   0.283
## eventTEDxHouston                          15990432    2997850   5.334
## eventTEDxImperialCollege                   1035115    3461619   0.299
## eventTEDxIndianapolis                      3211287    3461619   0.928
## eventTEDxIndianaUniversity                  984707    3461619   0.284
## eventTEDxIslay                              -29544    3461619  -0.009
## eventTEDxJaffa 2012                        1883979    3461619   0.544
## eventTEDxJaffa 2013                        2781548    3461619   0.804
## eventTEDxKC                                1170502    2736650   0.428
## eventTEDxKids@Ambleside                    2804218    3461619   0.810
## eventTEDxKids@Brussels                      288561    3461619   0.083
## eventTEDxKrakow                             650600    2997850   0.217
## eventTEDxKyoto                             3048519    2997850   1.017
## eventTEDxLeuvenSalon                       1040006    3461619   0.300
## eventTEDxLinnaeusUniversity                4835066    3461619   1.397
## eventTEDxLondonBusinessSchool               731543    3461619   0.211
## eventTEDxMaastricht                         339533    2736650   0.124
## eventTEDxManchester                         626823    2997850   0.209
## eventTEDxManhattan                         1399935    2997850   0.467
## eventTEDxManhattanBeach                    3101886    2826400   1.097
## eventTEDxMarin                             1673864    2826400   0.592
## eventTEDxMaui                              1654884    2997850   0.552
## eventTEDxMet                               4057808    2997850   1.354
## eventTEDxMIA                                199174    3461619   0.058
## eventTEDxMiamiUniversity                    257601    3461619   0.074
## eventTEDxMidAtlantic                       2155786    2518698   0.856
## eventTEDxMidAtlantic 2013                  1447978    2997850   0.483
## eventTEDxMidwest                           2330102    2826400   0.824
## eventTEDxMileHigh                           736408    2736650   0.269
## eventTEDxMonroeCorrectionalComplex          663145    3461619   0.192
## eventTEDxMonterey                            17017    3461619   0.005
## eventTEDxMontreal                           421191    3461619   0.122
## eventTEDxMtHood                            3314949    3461619   0.958
## eventTEDxMuncyStatePrison                  1160260    3461619   0.335
## eventTEDxNASA                              1122486    2997850   0.374
## eventTEDxNASA@SiliconValley                   6077    3461619   0.002
## eventTEDxNatick                             733022    3461619   0.212
## eventTEDxNewy                               723412    3461619   0.209
## eventTEDxNewYork                           1645449    2643856   0.622
## eventTEDxNextGenerationAsheville           2002916    3461619   0.579
## eventTEDxNijmegen                           640773    3461619   0.185
## eventTEDxNorrkoping                        6419675    3461619   1.855
## eventTEDxNorthwesternU                      921314    3461619   0.266
## eventTEDxNYED                              1277752    2997850   0.426
## eventTEDxO'Porto                            141433    3461619   0.041
## eventTEDxObserver                            12709    2997850   0.004
## eventTEDxOilSpill                           272667    2826400   0.096
## eventTEDxOmaha                             1464706    3461619   0.423
## eventTEDxOrangeCoast                       1474366    3461619   0.426
## eventTEDxOrcasIsland                        827772    3461619   0.239
## eventTEDxOslo                              1503846    2997850   0.502
## eventTEDxParis 2010                         382504    2997850   0.128
## eventTEDxParis 2012                        3691032    3461619   1.066
## eventTEDxPeachtree                         1642753    2826400   0.581
## eventTEDxPenn                              1199435    3461619   0.346
## eventTEDxPennQuarter                       2285024    3461619   0.660
## eventTEDxPennsylvaniaAvenue                 598667    3461619   0.173
## eventTEDxPerth                             1621258    2997850   0.541
## eventTEDxPhoenix                            134868    2736650   0.049
## eventTEDxPittsburgh                          58297    3461619   0.017
## eventTEDxPlaceDesNations                    860569    2997850   0.287
## eventTEDxPortland                          2609233    3461619   0.754
## eventTEDxPortofSpain                        333713    2997850   0.111
## eventTEDxProvidence                        1264903    3461619   0.365
## eventTEDxPSU                               1233463    2616738   0.471
## eventTEDxPuget Sound                      34159614    3461619   9.868
## eventTEDxRainier                           1753792    2681359   0.654
## eventTEDxRC2                                346408    2826400   0.123
## eventTEDxRiodelaPlata                      1100030    2596215   0.424
## eventTEDxRotterdam 2010                    1075212    2997850   0.359
## eventTEDxSaltLakeCity                       738834    3461619   0.213
## eventTEDxSanDiego                           453885    2997850   0.151
## eventTEDxSanJoseCA                          682589    3461619   0.197
## eventTEDxSanMigueldeAllende                 170578    2997850   0.057
## eventTEDxSanQuentin                        2049071    3461619   0.592
## eventTEDxSantaCruz                          137598    3461619   0.040
## eventTEDxSBU                               1140928    3461619   0.330
## eventTEDxSeattleU                           985186    3461619   0.285
## eventTEDxSeoul                             1691898    3461619   0.489
## eventTEDxSF                                3508340    3461619   1.013
## eventTEDxSFU                               1652698    3461619   0.477
## eventTEDxSiliconValley                      532055    3461619   0.154
## eventTEDxSkoll                              632462    2997850   0.211
## eventTEDxSMU                                592831    2826400   0.210
## eventTEDxSonomaCounty                      1474137    3461619   0.426
## eventTEDxSouthBank                         1714111    3461619   0.495
## eventTEDxStanford                          1069571    2580139   0.415
## eventTEDxSummit                            1452820    2547683   0.570
## eventTEDxSussexUniversity                    24508    3461619   0.007
## eventTEDxSydney                            1921570    2540134   0.756
## eventTEDxTC                                1178673    2826400   0.417
## eventTEDxTeen                               858890    2997850   0.287
## eventTEDxTelAviv 2010                       304542    2997850   0.102
## eventTEDxThessaloniki                       599151    2826400   0.212
## eventTEDxTokyo                             1411932    2997850   0.471
## eventTEDxToronto                            812671    3461619   0.235
## eventTEDxToronto 2010                      2079520    2736650   0.760
## eventTEDxToronto 2011                       407128    2997850   0.136
## eventTEDxToulouse                           708021    3461619   0.205
## eventTEDxUCL                                203532    3461619   0.059
## eventTEDxUdeM                               468653    3461619   0.135
## eventTEDxUF                                1005689    3461619   0.291
## eventTEDxUIUC                               448875    3461619   0.130
## eventTEDxUM                                 611234    3461619   0.177
## eventTEDxUMKC                              1359370    3461619   0.393
## eventTEDxUniversityofNevada                1563970    3461619   0.452
## eventTEDxUofM                              1482095    3461619   0.428
## eventTEDxUSC                                842377    2580139   0.326
## eventTEDxUW                                5767383    3461619   1.666
## eventTEDxVancouver                          826486    2736650   0.302
## eventTEDxVictoria                           642961    3461619   0.186
## eventTEDxVienna                             719644    2826400   0.255
## eventTEDxVirginiaTech                       722197    3461619   0.209
## eventTEDxWarwick                            888952    2736650   0.325
## eventTEDxWaterloo                           -11006    3461619  -0.003
## eventTEDxWinnipeg                          1048767    3461619   0.303
## eventTEDxWitsUniversity                     854045    3461619   0.247
## eventTEDxWomen 2011                         878411    2616738   0.336
## eventTEDxWomen 2012                        1174267    2643856   0.444
## eventTEDxYouth@Manchester                  1576322    2997850   0.526
## eventTEDxYouth@Sydney                       646076    2997850   0.216
## eventTEDxYYC                                220143    2826400   0.078
## eventTEDxZurich                             471613    3461619   0.136
## eventTEDxZurich 2011                        468911    3461619   0.135
## eventTEDxZurich 2012                       1311720    2997850   0.438
## eventTEDxZurich 2013                        815473    3461619   0.236
## eventTEDYouth 2011                         1194095    2826400   0.422
## eventTEDYouth 2012                          125168    3461619   0.036
## eventTEDYouth 2013                          812988    2826400   0.288
## eventTEDYouth 2014                         1056465    2616738   0.404
## eventTEDYouth 2015                         1621288    2681359   0.605
## eventThe Do Lectures                        -37497    3461619  -0.011
## eventToronto Youth Corps                    878812    3461619   0.254
## eventUniversity of California               112014    2997850   0.037
## eventWeb 2.0 Expo 2008                      607973    3461619   0.176
## eventWorld Science Festival                3152494    3461619   0.911
##                                           Pr(>|t|)    
## (Intercept)                                0.95120    
## eventArbejdsglaede Live                    0.81237    
## eventBBC TV                                0.91439    
## eventBowery Poetry Club                    0.87903    
## eventBusiness Innovation Factory           0.95648    
## eventCarnegie Mellon University            0.90459    
## eventChautauqua Institution                0.96866    
## eventDICE Summit 2010                      0.93110    
## eventDLD 2007                              0.85923    
## eventEG 2007                               0.72684    
## eventEG 2008                               0.46471    
## eventElizabeth G. Anderson School          0.80007    
## eventEric Whitacre's Virtual Choir         0.94256    
## eventFort Worth City Council               0.97031    
## eventFull Spectrum Auditions               0.71755    
## eventGel Conference                        0.86583    
## eventGlobal Witness HQ                     0.80251    
## eventHandheld Learning                     0.97661    
## eventHarvard University                    0.71669    
## eventINK Conference                        0.57230    
## eventJustice with Michael Sandel           0.94389    
## eventLIFT 2007                             0.69935    
## eventMichael Howard Studios                0.99374    
## eventMission Blue II                       0.64960    
## eventMission Blue Voyage                   0.87819    
## eventNew York State Senate                 0.96715    
## eventNextGen:Charity                       0.96687    
## eventPrinceton University                  0.88292    
## eventRoyal Institution                     0.96116    
## eventRSA Animate                           0.75935    
## eventSerious Play 2008                     0.76501    
## eventSkoll World Forum 2007                0.93820    
## eventSoulPancake                           0.84465    
## eventStanford University                   0.13272    
## eventTaste3 2008                           0.73701    
## eventTED Dialogues                         0.72554    
## eventTED Fellows 2015                      0.74048    
## eventTED Fellows Retreat 2013              0.78216    
## eventTED Fellows Retreat 2015              0.53870    
## eventTED in the Field                      0.85323    
## eventTED Prize Wish                        0.93752    
## eventTED Residency                         0.76040    
## eventTED Senior Fellows at TEDGlobal 2010  0.96146    
## eventTED Studio                            0.64483    
## eventTED Talks Education                   0.10378    
## eventTED Talks Live                        0.59155    
## eventTED-Ed                                0.92777    
## eventTED-Ed Weekend                        0.79308    
## eventTED@Bangalore                         0.78036    
## eventTED@BCG Berlin                        0.57600    
## eventTED@BCG London                        0.40491    
## eventTED@BCG Paris                         0.60573    
## eventTED@BCG San Francisco                 0.52555    
## eventTED@BCG Singapore                     0.69185    
## eventTED@Cannes                            0.64138    
## eventTED@IBM                               0.57824    
## eventTED@Intel                             0.78477    
## eventTED@Johannesburg                      0.67647    
## eventTED@London                            0.77846    
## eventTED@MotorCity                         0.84980    
## eventTED@Nairobi                           0.82239    
## eventTED@New York                          0.58385    
## eventTED@NYC                               0.53891    
## eventTED@State                             0.77180    
## eventTED@State Street Boston               0.64386    
## eventTED@State Street London               0.47072    
## eventTED@StateStreet Boston                0.65905    
## eventTED@SXSWi                             0.88389    
## eventTED@Unilever                          0.56101    
## eventTED@UPS                               0.53876    
## eventTED1984                               0.81181    
## eventTED1990                               0.89179    
## eventTED1994                               0.90079    
## eventTED1998                               0.82018    
## eventTED2001                               0.54544    
## eventTED2002                               0.74762    
## eventTED2003                               0.69871    
## eventTED2004                               0.30647    
## eventTED2005                               0.50954    
## eventTED2006                               0.20689    
## eventTED2007                               0.58093    
## eventTED2008                               0.44436    
## eventTED2009                               0.51458    
## eventTED2010                               0.50390    
## eventTED2011                               0.46090    
## eventTED2012                               0.40069    
## eventTED2013                               0.38228    
## eventTED2014                               0.43488    
## eventTED2015                               0.45012    
## eventTED2016                               0.49978    
## eventTED2017                               0.67371    
## eventTEDActive 2011                        0.69554    
## eventTEDActive 2014                        0.66030    
## eventTEDActive 2015                        0.68992    
## eventTEDCity2.0                            0.70507    
## eventTEDGlobal 2005                        0.55054    
## eventTEDGlobal 2007                        0.86863    
## eventTEDGlobal 2009                        0.53533    
## eventTEDGlobal 2010                        0.63025    
## eventTEDGlobal 2011                        0.52490    
## eventTEDGlobal 2012                        0.43552    
## eventTEDGlobal 2013                        0.32371    
## eventTEDGlobal 2014                        0.63705    
## eventTEDGlobal 2017                        0.88572    
## eventTEDGlobal>Geneva                      0.20579    
## eventTEDGlobal>London                      0.45107    
## eventTEDGlobalLondon                       0.35728    
## eventTEDIndia 2009                         0.62622    
## eventTEDLagos Ideas Search                 0.71402    
## eventTEDMED 2009                           0.41232    
## eventTEDMED 2010                           0.90504    
## eventTEDMED 2011                           0.81207    
## eventTEDMED 2012                           0.63965    
## eventTEDMED 2013                           0.55517    
## eventTEDMED 2014                           0.53468    
## eventTEDMED 2015                           0.46352    
## eventTEDMED 2016                           0.67081    
## eventTEDNairobi Ideas Search               0.87445    
## eventTEDNYC                                0.66346    
## eventTEDPrize@UN                           0.82994    
## eventTEDSalon 2006                         0.72932    
## eventTEDSalon 2007 Hot Science             0.80557    
## eventTEDSalon 2009 Compassion              0.96276    
## eventTEDSalon Berlin 2014                  0.54489    
## eventTEDSalon London 2009                  0.81583    
## eventTEDSalon London 2010                  0.63691    
## eventTEDSalon London Fall 2011             0.48803    
## eventTEDSalon London Fall 2012             0.40027    
## eventTEDSalon London Spring 2011           0.73298    
## eventTEDSalon London Spring 2012           0.72655    
## eventTEDSalon NY2011                       0.60260    
## eventTEDSalon NY2012                       0.76931    
## eventTEDSalon NY2013                       0.28893    
## eventTEDSalon NY2014                       0.44907    
## eventTEDSalon NY2015                       0.65203    
## eventTEDSummit                             0.57845    
## eventTEDWomen 2010                         0.64387    
## eventTEDWomen 2013                         0.36147    
## eventTEDWomen 2015                         0.60179    
## eventTEDWomen 2016                         0.63345    
## eventTEDxABQ                               0.96568    
## eventTEDxAmazonia                          0.80938    
## eventTEDxAmericanRiviera                   0.22134    
## eventTEDxAmoskeagMillyard                  0.79266    
## eventTEDxAmsterdam                         0.80360    
## eventTEDxArendal                           0.58010    
## eventTEDxAsheville                         0.93810    
## eventTEDxAthens                            0.58111    
## eventTEDxAtlanta                           0.88092    
## eventTEDxAustin                            0.70708    
## eventTEDxBeaconStreet                      0.41107    
## eventTEDxBeirut                            0.75435    
## eventTEDxBend                              0.34495    
## eventTEDxBerkeley                          0.83707    
## eventTEDxBerlin                            0.75931    
## eventTEDxBG                                0.89604    
## eventTEDxBinghamtonUniversity              0.22622    
## eventTEDxBloomington                       0.00187 ** 
## eventTEDxBoston                            0.71378    
## eventTEDxBoston 2009                       0.92016    
## eventTEDxBoston 2010                       0.84740    
## eventTEDxBoston 2011                       0.84625    
## eventTEDxBoston 2012                       0.77031    
## eventTEDxBoulder                           0.55955    
## eventTEDxBoulder 2011                      0.75594    
## eventTEDxBratislava                        0.60706    
## eventTEDxBrighton                          0.83361    
## eventTEDxBrussels                          0.70237    
## eventTEDxCaFoscariU                        0.76447    
## eventTEDxCaltech                           0.67210    
## eventTEDxCambridge                         0.57424    
## eventTEDxCanberra                          0.87258    
## eventTEDxCannes                            0.85963    
## eventTEDxCERN                              0.65039    
## eventTEDxChange                            0.74710    
## eventTEDxChapmanU                          0.39441    
## eventTEDxCHUV                              0.19054    
## eventTEDxClaremontColleges                 0.72983    
## eventTEDxCMU                               0.65895    
## eventTEDxColbyCollege                      0.71292    
## eventTEDxColoradoSprings                   0.80020    
## eventTEDxColumbus                          0.74427    
## eventTEDxColumbusWomen                     0.58653    
## eventTEDxConcorde                          0.76297    
## eventTEDxConcordiaUPortland                0.36744    
## eventTEDxCreativeCoast                     0.01664 *  
## eventTEDxCrenshaw                          0.86355    
## eventTEDxDanubia                           0.68857    
## eventTEDxDeExtinction                      0.84097    
## eventTEDxDelft                             0.79745    
## eventTEDxDesMoines                         0.74412    
## eventTEDxDirigo                            0.99038    
## eventTEDxDU 2010                           0.80240    
## eventTEDxDU 2011                           0.94944    
## eventTEDxDubai                             0.67009    
## eventTEDxDublin                            0.89095    
## eventTEDxEast                              0.82905    
## eventTEDxEastEnd                           0.63134    
## eventTEDxEdmonton                          0.67974    
## eventTEDxEQChCh                            0.43895    
## eventTEDxEuston                            0.73570    
## eventTEDxExeter                            0.67314    
## eventTEDxFiDiWomen                         0.66987    
## eventTEDxFrankfurt                         0.82650    
## eventTEDxFulbrightDublin                   0.79426    
## eventTEDxGatewayWomen                      0.71196    
## eventTEDxGeorgetown                        0.88271    
## eventTEDxGhent                             0.81639    
## eventTEDxGlasgow                           0.79657    
## eventTEDxGoldenGatePark 2012               0.17825    
## eventTEDxGoodenoughCollege                 0.92428    
## eventTEDxGöteborg 2010                     0.92784    
## eventTEDxGrandRapids                       0.70380    
## eventTEDxGreatPacificGarbagePatch          0.91667    
## eventTEDxGroningen                         0.70655    
## eventTEDxHamburg                           0.91045    
## eventTEDxHampshireCollege                  0.83303    
## eventTEDxHelvetia                          0.76274    
## eventTEDxHogeschoolUtrecht                 0.88781    
## eventTEDxHousesOfParliament                0.77758    
## eventTEDxHouston                          1.06e-07 ***
## eventTEDxImperialCollege                   0.76495    
## eventTEDxIndianapolis                      0.35367    
## eventTEDxIndianaUniversity                 0.77608    
## eventTEDxIslay                             0.99319    
## eventTEDxJaffa 2012                        0.58633    
## eventTEDxJaffa 2013                        0.42175    
## eventTEDxKC                                0.66890    
## eventTEDxKids@Ambleside                    0.41798    
## eventTEDxKids@Brussels                     0.93357    
## eventTEDxKrakow                            0.82821    
## eventTEDxKyoto                             0.30931    
## eventTEDxLeuvenSalon                       0.76387    
## eventTEDxLinnaeusUniversity                0.16263    
## eventTEDxLondonBusinessSchool              0.83265    
## eventTEDxMaastricht                        0.90127    
## eventTEDxManchester                        0.83440    
## eventTEDxManhattan                         0.64056    
## eventTEDxManhattanBeach                    0.27256    
## eventTEDxMarin                             0.55376    
## eventTEDxMaui                              0.58099    
## eventTEDxMet                               0.17601    
## eventTEDxMIA                               0.95412    
## eventTEDxMiamiUniversity                   0.94069    
## eventTEDxMidAtlantic                       0.39214    
## eventTEDxMidAtlantic 2013                  0.62914    
## eventTEDxMidwest                           0.40980    
## eventTEDxMileHigh                          0.78788    
## eventTEDxMonroeCorrectionalComplex         0.84810    
## eventTEDxMonterey                          0.99608    
## eventTEDxMontreal                          0.90317    
## eventTEDxMtHood                            0.33836    
## eventTEDxMuncyStatePrison                  0.73752    
## eventTEDxNASA                              0.70812    
## eventTEDxNASA@SiliconValley                0.99860    
## eventTEDxNatick                            0.83232    
## eventTEDxNewy                              0.83448    
## eventTEDxNewYork                           0.53377    
## eventTEDxNextGenerationAsheville           0.56291    
## eventTEDxNijmegen                          0.85316    
## eventTEDxNorrkoping                        0.06380 .  
## eventTEDxNorthwesternU                     0.79015    
## eventTEDxNYED                              0.66999    
## eventTEDxO'Porto                           0.96741    
## eventTEDxObserver                          0.99662    
## eventTEDxOilSpill                          0.92315    
## eventTEDxOmaha                             0.67224    
## eventTEDxOrangeCoast                       0.67021    
## eventTEDxOrcasIsland                       0.81103    
## eventTEDxOslo                              0.61597    
## eventTEDxParis 2010                        0.89848    
## eventTEDxParis 2012                        0.28642    
## eventTEDxPeachtree                         0.56115    
## eventTEDxPenn                              0.72900    
## eventTEDxPennQuarter                       0.50926    
## eventTEDxPennsylvaniaAvenue                0.86271    
## eventTEDxPerth                             0.58870    
## eventTEDxPhoenix                           0.96070    
## eventTEDxPittsburgh                        0.98657    
## eventTEDxPlaceDesNations                   0.77409    
## eventTEDxPortland                          0.45107    
## eventTEDxPortofSpain                       0.91137    
## eventTEDxProvidence                        0.71484    
## eventTEDxPSU                               0.63742    
## eventTEDxPuget Sound                       < 2e-16 ***
## eventTEDxRainier                           0.51314    
## eventTEDxRC2                               0.90247    
## eventTEDxRiodelaPlata                      0.67182    
## eventTEDxRotterdam 2010                    0.71988    
## eventTEDxSaltLakeCity                      0.83101    
## eventTEDxSanDiego                          0.87967    
## eventTEDxSanJoseCA                         0.84370    
## eventTEDxSanMigueldeAllende                0.95463    
## eventTEDxSanQuentin                        0.55395    
## eventTEDxSantaCruz                         0.96830    
## eventTEDxSBU                               0.74174    
## eventTEDxSeattleU                          0.77598    
## eventTEDxSeoul                             0.62506    
## eventTEDxSF                                0.31093    
## eventTEDxSFU                               0.63310    
## eventTEDxSiliconValley                     0.87786    
## eventTEDxSkoll                             0.83293    
## eventTEDxSMU                               0.83388    
## eventTEDxSonomaCounty                      0.67026    
## eventTEDxSouthBank                         0.62053    
## eventTEDxStanford                          0.67852    
## eventTEDxSummit                            0.56857    
## eventTEDxSussexUniversity                  0.99435    
## eventTEDxSydney                            0.44944    
## eventTEDxTC                                0.67670    
## eventTEDxTeen                              0.77452    
## eventTEDxTelAviv 2010                      0.91909    
## eventTEDxThessaloniki                      0.83214    
## eventTEDxTokyo                             0.63770    
## eventTEDxToronto                           0.81441    
## eventTEDxToronto 2010                      0.44741    
## eventTEDxToronto 2011                      0.89199    
## eventTEDxToulouse                          0.83795    
## eventTEDxUCL                               0.95312    
## eventTEDxUdeM                              0.89232    
## eventTEDxUF                                0.77144    
## eventTEDxUIUC                              0.89684    
## eventTEDxUM                                0.85986    
## eventTEDxUMKC                              0.69458    
## eventTEDxUniversityofNevada                0.65146    
## eventTEDxUofM                              0.66858    
## eventTEDxUSC                               0.74409    
## eventTEDxUW                                0.09584 .  
## eventTEDxVancouver                         0.76268    
## eventTEDxVictoria                          0.85267    
## eventTEDxVienna                            0.79904    
## eventTEDxVirginiaTech                      0.83476    
## eventTEDxWarwick                           0.74534    
## eventTEDxWaterloo                          0.99746    
## eventTEDxWinnipeg                          0.76194    
## eventTEDxWitsUniversity                    0.80515    
## eventTEDxWomen 2011                        0.73714    
## eventTEDxWomen 2012                        0.65698    
## eventTEDxYouth@Manchester                  0.59907    
## eventTEDxYouth@Sydney                      0.82939    
## eventTEDxYYC                               0.93792    
## eventTEDxZurich                            0.89164    
## eventTEDxZurich 2011                       0.89226    
## eventTEDxZurich 2012                       0.66175    
## eventTEDxZurich 2013                       0.81378    
## eventTEDYouth 2011                         0.67272    
## eventTEDYouth 2012                         0.97116    
## eventTEDYouth 2013                         0.77365    
## eventTEDYouth 2014                         0.68645    
## eventTEDYouth 2015                         0.54547    
## eventThe Do Lectures                       0.99136    
## eventToronto Youth Corps                   0.79962    
## eventUniversity of California              0.97020    
## eventWeb 2.0 Expo 2008                     0.86060    
## eventWorld Science Festival                0.36255    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2448000 on 2195 degrees of freedom
## Multiple R-squared:  0.1735, Adjusted R-squared:  0.04021 
## F-statistic: 1.302 on 354 and 2195 DF,  p-value: 0.0003659
# the specific ted location / event conference did not show any significant effects on views 

summary(lm(views  ~ ted$date_pub, data = ted))
## 
## Call:
## lm(formula = views ~ ted$date_pub, data = ted)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1685735  -957633  -557580    12248 45437906 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2334019.30  704368.13   3.314 0.000934 ***
## ted$date_pub     -40.88      45.19  -0.905 0.365669    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2499000 on 2548 degrees of freedom
## Multiple R-squared:  0.0003212,  Adjusted R-squared:  -7.116e-05 
## F-statistic: 0.8186 on 1 and 2548 DF,  p-value: 0.3657
# the date published didn't explain at all and had a very small coefiicient which wasn't significant


#fit_occ <- lm(views ~ factor(ted$speaker_occupation), data = ted)
#summary(fit_occ)

Only a few occupations were significant in predicting view counts: factor(ted\(speaker_occupation)Author/educator < 2e-16 *** factor(ted\)speaker_occupation)Autonomous systems pioneer 0.012050 *
factor(ted\(speaker_occupation)Beatboxer 0.006905 ** factor(ted\)speaker_occupation)Blogger 0.001010 ** factor(ted\(speaker_occupation)Bionics designer 0.022110 * factor(ted\)speaker_occupation)Anthropologist, expert on love 0.023640 *

These are correlated with a number of the occupations of the most popular ted talk; for example, Author/educator is the occupation of Ken Robinson, who speaks at the most popular Ted Talk of all times and many other popular talks. However, the regression still wasn’t successful, and while the R squared was 0.53, the adjusted R-squared was -0.08, because of the huge amount of predictors.

Combining regressors into multiple regression

# Adding variables one-by-one
fit6 <- lm(views  ~ comments + languages, data = ted)
fit7 <- lm(views  ~ comments + languages + num_tags, data=ted)
fit8 <- lm(views  ~ comments + languages + num_tags + weekend , data=ted)
fit9 <- lm(views  ~ comments + languages + num_tags + weekend + duration, data=ted)

summary(fit1) #just comments
## 
## Call:
## lm(formula = views ~ comments, data = ted)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -26514433   -667711   -254429    229873  31596994 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 798186.6    50681.6   15.75   <2e-16 ***
## comments      4698.8      148.6   31.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2118000 on 2548 degrees of freedom
## Multiple R-squared:  0.2819, Adjusted R-squared:  0.2816 
## F-statistic:  1000 on 1 and 2548 DF,  p-value: < 2.2e-16
summary(fit6)
## 
## Call:
## lm(formula = views ~ comments + languages, data = ted)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -23341925   -730945   -226309    394521  31533398 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -733892.8   123037.9  -5.965 2.79e-09 ***
## comments       4044.9      151.4  26.721  < 2e-16 ***
## languages     60650.3     4468.6  13.573  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2045000 on 2547 degrees of freedom
## Multiple R-squared:  0.3303, Adjusted R-squared:  0.3298 
## F-statistic: 628.2 on 2 and 2547 DF,  p-value: < 2.2e-16
summary(fit7)
## 
## Call:
## lm(formula = views ~ comments + languages + num_tags, data = ted)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -23464948   -738986   -220033    353133  31476369 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -993507.0   152609.0  -6.510 9.01e-11 ***
## comments       4071.5      151.4  26.884  < 2e-16 ***
## languages     62446.0     4506.0  13.858  < 2e-16 ***
## num_tags      27351.3     9536.7   2.868  0.00416 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2042000 on 2546 degrees of freedom
## Multiple R-squared:  0.3325, Adjusted R-squared:  0.3317 
## F-statistic: 422.7 on 3 and 2546 DF,  p-value: < 2.2e-16
summary(fit8)
## 
## Call:
## lm(formula = views ~ comments + languages + num_tags + weekend, 
##     data = ted)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -23490106   -738708   -216928    350143  31474583 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -975622.2   158508.7  -6.155  8.7e-10 ***
## comments       4076.0      151.9  26.841  < 2e-16 ***
## languages     61949.1     4660.7  13.292  < 2e-16 ***
## num_tags      27156.9     9549.5   2.844  0.00449 ** 
## weekend      -86147.3   205940.5  -0.418  0.67575    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2043000 on 2545 degrees of freedom
## Multiple R-squared:  0.3325, Adjusted R-squared:  0.3315 
## F-statistic:   317 on 4 and 2545 DF,  p-value: < 2.2e-16
summary(fit9)
## 
## Call:
## lm(formula = views ~ comments + languages + num_tags + weekend + 
##     duration, data = ted)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -23061497   -729849   -236805    353088  31452340 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1455238.1   209605.7  -6.943 4.86e-12 ***
## comments        3931.5      157.1  25.027  < 2e-16 ***
## languages      68223.0     4986.4  13.682  < 2e-16 ***
## num_tags       26625.6     9529.9   2.794 0.005247 ** 
## weekend       -41407.3   205890.7  -0.201 0.840626    
## duration         408.8      117.3   3.487 0.000497 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2038000 on 2544 degrees of freedom
## Multiple R-squared:  0.3357, Adjusted R-squared:  0.3344 
## F-statistic: 257.1 on 5 and 2544 DF,  p-value: < 2.2e-16

Perform tests for significance of the parameters and present the results (1-2 short paragraphs).

library(stargazer)
stargazer(fit1, fit2, fit3, fit4, fit6, fit7, fit8, fit9, type="text", title="All Models Compared", align=TRUE, no.space=TRUE, font.size = "footnotesize")
## 
## All Models Compared
## =====================================================================================================================================================================================================================================
##                                                                                                                    Dependent variable:                                                                                               
##                     -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
##                                                                                                                           views                                                                                                      
##                                 (1)                        (2)                       (3)                       (4)                       (5)                       (6)                       (7)                       (8)           
## -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## comments                   4,698.788***                                                                                             4,044.862***              4,071.480***              4,076.027***              3,931.536***       
##                              (148.571)                                                                                                (151.374)                 (151.444)                 (151.858)                 (157.090)        
## languages                                             98,655.110***                                                                 60,650.310***             62,446.000***             61,949.050***             68,222.950***      
##                                                        (4,792.403)                                                                   (4,468.592)               (4,505.978)               (4,660.659)               (4,986.410)       
## duration                                                                          325.599**                                                                                                                        408.844***        
##                                                                                   (132.184)                                                                                                                         (117.251)        
## weekend                                                                                                  -836,986.200***                                                                                                             
##                                                                                                           (243,014.000)                                                                                                              
## num_tags                                                                                                                                                      27,351.270***             27,156.920***             26,625.560***      
##                                                                                                                                                                (9,536.676)               (9,549.530)               (9,529.882)       
## weekend                                                                                                                                                                                  -86,147.350               -41,407.250       
##                                                                                                                                                                                         (205,940.500)             (205,890.700)      
## Constant                  798,186.600***             -997,579.200***          1,429,187.000***          1,734,403.000***           -733,892.800***           -993,507.000***           -975,622.200***          -1,455,238.000***    
##                            (50,681.560)               (138,743.900)             (119,912.200)             (50,472.810)              (123,037.900)             (152,609.000)             (158,508.700)             (209,605.700)      
## -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## Observations                   2,550                      2,550                     2,550                     2,550                     2,550                     2,550                     2,550                     2,550          
## R2                             0.282                      0.143                     0.002                     0.005                     0.330                     0.332                     0.333                     0.336          
## Adjusted R2                    0.282                      0.142                     0.002                     0.004                     0.330                     0.332                     0.331                     0.334          
## Residual Std. Error  2,117,652.000 (df = 2548)  2,313,944.000 (df = 2548) 2,496,000.000 (df = 2548) 2,493,173.000 (df = 2548) 2,045,392.000 (df = 2547) 2,042,496.000 (df = 2546) 2,042,827.000 (df = 2545) 2,038,364.000 (df = 2544)
## F Statistic         1,000.232*** (df = 1; 2548) 423.772*** (df = 1; 2548)  6.068** (df = 1; 2548)   11.862*** (df = 1; 2548)  628.185*** (df = 2; 2547) 422.720*** (df = 3; 2546) 316.981*** (df = 4; 2545) 257.128*** (df = 5; 2544)
## =====================================================================================================================================================================================================================================
## Note:                                                                                                                                                                                                     *p<0.1; **p<0.05; ***p<0.01

Models and Results

Chosen model is the last model since it had the best explanatory power in terms of R squared, adjusted R squared, p value and F-statistic, although it had only marginal improvements over model (5) with only comments and languages translated. Model 5: Y(views)=β_0+ β_1 comments+ β_2 languages+ϵ Model 8: Y(views)=β_1 comments+ β_2 languages+ β_3 numtags+ β_4 isweekend+β_5 duration+ϵ For predicting purposes, I would choose model 8 with all variables. For explanatory purposes, I would choose model 5 to explain that comments and languages are by far the most correlated with views and explain most of its variance. Model 5 suggests that every additional comment is associated with 4,044 more views (p-value under 0.01) and that every additional language translated is associated with 60,650 more views (p-value under 0.01). However, the constant is negative (-733) views, which makes no sense, but that comes with the restriction of a linear model. These together explained 0.33 of the variance (both R-squared and adjusted R-squared). The F-statistic Y(views)=-733+ 4044comments+ 60650languages However, adding all the other variables into model 8 improved slightly the R-squared to 0.336 and Adjusted R-squared to 0.334. So, if we are after accuracy for prediction, I would use this latter model: Y(views)=-1455238+3931comments+ 68222languages+408 duration+ 26625numtags+ -41407isweekend The results, and particularly model 8, show overall significance. Most variables show significance, although weekend does not, but adding it still improved the explanatory power slightly, so I’m keeping it. F-statistic is relatively lower, and R and R-squared are not great at 0.336 and 0.334 respectively, but the best performance out of this set of models. The constant decreased much more, giving more power to the variables to raise the predicted view count. The coefficients (estimation of the effect) of comments decreased from 4044 to 3931 and was redistributed to higher coefficient for the number of languages and new coefficients for the newly added variables: 408 more views for every additional second, 26625 more views for every additional tag, and this is compensated by reducing the predicted number of views by 41407 if it was published on a weekend.

Conclusion and Implications

For conclusion, this very limited model does not convey causal relationship well because the fundamental problem of causal inference is not well addressed with these variables, and these predictors are not independent from the y variable, but they are highly related (mostly comments and number of languages which are the best predictors, naturally. I don’t believe that with these available numerical predictors we could have reached a causal inference. Next attempts might use the transcription of the talk to analyze the content, or audio to analyze the level of clapping, or the visuals in the talk and the clothing of the speaker to better predict using the content of the talk. Implications However, we can see some correlations, even if not causal. So, if you want a higher number of views for a talk, it would be likelier if you: Increase the number of comments in a talk! (get all of your friends to comment and discuss) Increase the languages translated! (get your friends or freelancers to translate) There is no need to make it too short! (probably only works if the talk is really good) Tag it with more topics! Whatever you do – do NOT publish it on a weekend. Publish it on a Friday! So, go increase your TED Talk’s view count and comment if these strategies worked or not! (and tell me about it, since that would be a helpful small experiment which could reaffirm or reject these results!)