Walt Hickey, writing for FiveThirtyEight, painstakingly compiled a list of every guest who’s been on The Daily Show with Jon Stewart. I cleaned up the data just slightly (available here), and threw it into R and started messing around. The original data is under an MIT-like license, and looks to be free of any restrictions. Think this would be any good for teaching? Comment here and let me know at @genetics_blog.

Quick look around

Load packages and get rid of ggplot2’s default gray background.

library(dplyr)
library(rio)
library(ggplot2)
library(knitr)
theme_set(theme_bw())

Import the data. This uses the lovely import function from the rio package, and sets the class to a "tbl_df" for easy viewing & munging with dplyr.

# This imports what's currently on the master branch. 
# d <- import("https://raw.githubusercontent.com/stephenturner/data538/master/daily-show-guests/daily_show_guests.csv", setclass="tbl_df")

# Let's be safe and import at the commit used to generate this post
d <- import("https://raw.githubusercontent.com/stephenturner/data538/8c71e8cd8ecc550bbcbde51db564fae7bedab746/daily-show-guests/daily_show_guests.csv", setclass="tbl_df")

names(d) <- c("year", "occupation", "date", "group", "guest")

Guests per year:

d %>% 
  group_by(year) %>% 
  summarize(n=n()) %>% 
  ggplot(aes(factor(year), n)) + 
    geom_bar(stat="identity") + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
    ggtitle("Guests per year")

Guests per year, colored by grouped occupation

d %>% 
  group_by(group, year) %>% 
  summarize(n=n()) %>% 
  ggplot(aes(factor(year), n)) + 
    geom_bar(stat="identity", aes(fill=group)) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
    ggtitle("Guests per year")

Total number of guests by grouped occupation:

d %>% 
  group_by(group) %>% 
  summarize(n=n()) %>% 
  arrange(desc(n)) %>% 
  na.omit %>% 
  ggplot(aes(reorder(group, n), n)) + 
    geom_bar(stat="identity") + 
    coord_flip() + 
    ggtitle("Guests by grouped occupation")

Guests by grouped occupation over time (top 5 only, omitting 2015):

d %>% 
  group_by(group, year) %>% 
  summarize(n=n()) %>% 
  summarize(m=mean(n)) %>% 
  arrange(desc(m)) %>% 
  filter(row_number()<=5) %>% 
  select(-m) %>% 
  inner_join(d, by="group") %>% 
  group_by(group, year) %>% 
  summarize(n=n()) %>% 
  filter(year<2015) %>% 
  ggplot(aes(year, n)) + geom_line(aes(col=group), lwd=2) + 
    ggtitle("Guests by grouped occupation over time")

Repeat guests:

d %>% 
  group_by(guest) %>% 
  summarize(n=n()) %>% 
  arrange(desc(n)) %>% 
  filter(n>=5) %>% 
  ggplot(aes(reorder(guest, n), n)) + 
    geom_bar(stat="identity") + 
    coord_flip() + 
    ggtitle("Repeat guests")

Total number of guests by occupation (at least 4 appearances required), colored by grouped occupation:

d %>% 
  mutate(occupation=tolower(occupation)) %>% 
  group_by(group, occupation) %>% 
  summarize(n=n()) %>% 
  ungroup %>% 
  arrange(desc(n)) %>% 
  na.omit %>% 
  filter(n>=4) %>% 
  ggplot(aes(reorder(occupation, n), n)) + 
    geom_bar(stat="identity", aes(fill=group)) + 
    # scale_y_log10() +
    coord_flip() + 
    ggtitle("Guests by occupation")

Data integrity issues

There are still plenty of lingering data issues. Sometimes when there are multiple guests on the show, there are two entries, each line showing (“guest 1 and guest 2”), with the occupation/group being split on multiple lines. For instance, both Adam Sandler and Chris Rock were on the show on June 24, 2010. There are two entries, both entries containing both guests’ names, but separate entries for the occupation/group. You’ll also see that Jon Favreau is listed twice as an “actor”, once as an occupation “speechwriter” under the group “Political Aide.”

d %>% 
  group_by(guest) %>% 
  summarize(ngroups=n_distinct(group)) %>% 
  filter(ngroups>1) %>% 
  select(-ngroups) %>% 
  inner_join(d, by="guest") %>% 
  arrange(year, guest) %>% 
  kable
guest year occupation date group
Bruce McCulloch and Mark McKinney 1999 actor 9/30/99 Acting
Bruce McCulloch and Mark McKinney 1999 comedian 9/30/99 Comedy
Frank DeCaro’s Oscar Special, John Larroquette 1999 writer 3/17/99 Media
Frank DeCaro’s Oscar Special, John Larroquette 1999 actor 3/17/99 Acting
Hootie & the Blowfish, Billy Crystal 1999 rock band 3/11/99 Musician
Hootie & the Blowfish, Billy Crystal 1999 actor 3/11/99 Acting
Jimmy Kimmel & Adam Carolla 2000 television host 6/14/00 Media
Jimmy Kimmel & Adam Carolla 2000 comedian 6/14/00 Comedy
Posh Spice & Baby Spice 2000 businesswoman 10/25/00 Business
Posh Spice & Baby Spice 2000 singer 10/25/00 Musician
Robert Reich and Ben Stein 2000 Former United States Secretary of Labor 8/4/00 Politician
Robert Reich and Ben Stein 2000 writer 8/4/00 Media
Jon Favreau 2001 actor 7/16/01 Acting
David Cross Bob Odenkirk 2002 stand-up comedian 6/20/02 Comedy
David Cross Bob Odenkirk 2002 actor 6/20/02 Acting
Jon Favreau 2002 actor 4/3/02 Acting
William Weld & Al Sharpton 2004 former govrnor of masssachusetts 11/2/04 Politician
William Weld & Al Sharpton 2004 minister 11/2/04 Clergy
Dr. Mustafa Barghouti and Anna Baltzer 2009 activist 10/28/09 Advocacy
Dr. Mustafa Barghouti and Anna Baltzer 2009 public speaker 10/28/09 Misc
Adam Sandler and Chris Rock 2010 actor 6/24/10 Acting
Adam Sandler and Chris Rock 2010 comedian 6/24/10 Comedy
Katie Dellamaggiore and Pobo Efekoro 2012 film director 11/8/12 Media
Katie Dellamaggiore and Pobo Efekoro 2012 chess player 11/8/12 Misc
Warren Buffett and Carol Loomis 2012 business magnate 11/27/12 Business
Warren Buffett and Carol Loomis 2012 journalist 11/27/12 Media
Zach the Erect“” Galifianakis & Will Ferrell“” 2012 Stand-up comedian 7/26/12 Comedy
Zach the Erect“” Galifianakis & Will Ferrell“” 2012 actor 7/26/12 Acting
Bob Odenkirk & David Cross 2013 actor 9/11/13 Acting
Bob Odenkirk & David Cross 2013 stand-up comedian 9/11/13 Comedy
Jon Favreau 2013 speechwriter 6/5/13 Political Aide
Amy Yates Wuelfing & Gibby Haynes 2014 author 3/25/14 Media
Amy Yates Wuelfing & Gibby Haynes 2014 musician 3/25/14 Musician
Bruce Springsteen & Frank Caruso 2014 musician 11/10/14 Musician
Bruce Springsteen & Frank Caruso 2014 illustrator 11/10/14 Media
Kathryn Bigelow & Juan Zarate 2014 filmmaker 12/9/14 Media
Kathryn Bigelow & Juan Zarate 2014 white house official 12/9/14 Political Aide
Maziar Bahari & Gael Garcí_a Bernal 2014 Journalist 11/13/14 Media
Maziar Bahari & Gael Garcí_a Bernal 2014 film actor 11/13/14 Acting

You’ll also see special broadcasts (e.g., war or election coverage) and other shows where there was no guest. This is often represented differently (none, None, (None), No Guest, No guest, (no guest), etc.)

d %>% filter(is.na(group)) %>% kable
year occupation date group guest
1999 NA 12/15/99 NA Greatest Millennium Special
1999 NA 7/21/99 NA Third Anniversary Special
1999 NA 8/30/99 NA The Daily Show Summer Spectacular
2000 NA 11/20/00 NA Tales of Survival with Vance DeGeneres
2000 NA 12/13/00 NA no guest
2000 NA 7/19/00 NA Fourth Anniversary Special
2000 NA 7/31/00 NA Campaign Trail to the Road to the White House Mo Rocca, Vance DeGeneres
2001 NA 5/2/01 NA No guest
2002 NA 10/14/02 NA Road to Washington Special
2002 NA 5/23/02 NA Matt Walsh Goes To Hawaii
2003 NA 11/24/03 NA Who are the Daily Show? Special
2003 NA 5/26/03 NA Iraq - A Look Baq (or how we learned to stop reporting and love the war)
2003 NA 5/26/03 NA Iraq - A Look Baq (or how we learned to stop reporting and love the war)
2003 NA 6/8/03 NA Looking Beyond The Show
2003 - 8/14/03 NA Again, A Look Back
2003 NA 9/1/03 NA I’m a Correspondent, Please Don’t Fire Me!
2003 NA 9/11/03 NA No Guest
2004 NA 7/29/04 NA None
2004 NA 7/30/04 NA None
2004 0 8/30/04 NA (None)
2006 NA 11/1/06 NA None
2007 0 3/8/07 NA John Bambenek
2008 NA 11/4/08 NA Indecision 2008 Live Election Night Special
2008 0 8/29/08 NA (no guest)
2008 0 9/5/08 NA (no guest)
2010 NA 10/28/10 NA none
2012 NA 11/6/12 NA Election Night: This Ends Now
2012 NA 3/15/12 NA None
2012 NA 8/31/12 NA none
2012 NA 9/7/12 NA none
2013 NA 6/6/13 NA No Guest

Also, there are 399 distinct occupation listings, but only 336 if you ignore case (e.g., entries for AUTHOR, Author, and author). There are also lots of similar occupations that should probably be grouped (e.g., things like “former u.s. congressman” and “former u.s. representative,” or “author,” “author of novels,” etc.).

mutate(d, occlow=tolower(occupation)) %>% 
  group_by(occlow) %>% 
  summarize(n=n_distinct(occupation)) %>% 
  filter(n>1) %>% 
  inner_join(mutate(d, occlow=tolower(occupation)), by="occlow") %>% 
  group_by(occupation) %>% 
  summarize(n=n()) %>% 
  arrange(tolower(occupation)) %>% 
  kable
occupation n
Academic 3
academic 3
Adviser 2
adviser 2
Astronaut 1
astronaut 1
Attorney 2
attorney 4
AUTHOR 2
Author 48
author 102
Baseball player 7
baseball player 1
Basketball player 3
basketball player 9
Broadcaster 13
broadcaster 5
Business magnate 2
business magnate 4
Businessman 2
businessman 3
Businesswoman 1
businesswoman 1
Chef 2
chef 3
Columnist 8
columnist 5
Comedian 39
comedian 64
Comic 1
comic 1
Commentator 4
commentator 20
Consultant 6
consultant 2
Correspondent 2
correspondent 4
Diplomat 4
diplomat 6
Director 2
director 7
Economist 3
economist 14
Editor 2
editor 17
Entrepreneur 2
entrepreneur 1
Executive 1
executive 1
Film actor 9
film actor 10
Film actress 9
film actress 12
Film director 10
film director 14
Filmmaker 11
filmmaker 4
Former American senator 1
former american senator 3
Former British Prime Minister 2
former british prime minister 1
Former Governor of Arkansas 8
former governor of arkansas 1
Former Governor of New Jersey 1
former governor of new jersey 3
Former White House Press Secretary 4
former white house press secretary 8
Guitarist 2
guitarist 1
Historian 11
historian 11
JOURNALIST 1
Journalist 72
journalist 180
Law professor 1
law professor 3
Lawyer 5
lawyer 14
Model 3
model 6
Musician 5
musician 14
Novelist 4
novelist 2
Photojournalist 1
photojournalist 1
Political figure 1
political figure 8
Political satirist 3
political satirist 1
Political Scientist 2
political scientist 4
Professional Wrestler 2
professional wrestler 1
Professor 15
professor 22
Rapper 2
rapper 8
Reporter 8
reporter 2
Rock band 2
rock band 12
Scholar 1
scholar 2
Screenwriter 2
screenwriter 4
Singer 9
singer 14
Singer-songwriter 14
singer-songwriter 19
Soccer player 1
soccer player 2
Stand-up comedian 18
stand-up comedian 26
Surgeon 1
surgeon 2
television Actor 2
television actor 1
television Personality 2
television personality 11
United States Senator 10
united states senator 5
Writer 22
writer 30