Walt Hickey, writing for FiveThirtyEight, painstakingly compiled a list of every guest who’s been on The Daily Show with Jon Stewart. I cleaned up the data just slightly (available here), and threw it into R and started messing around. The original data is under an MIT-like license, and looks to be free of any restrictions. Think this would be any good for teaching? Comment here and let me know at @genetics_blog.
Load packages and get rid of ggplot2’s default gray background.
library(dplyr)
library(rio)
library(ggplot2)
library(knitr)
theme_set(theme_bw())
Import the data. This uses the lovely import function from the rio package, and sets the class to a "tbl_df"
for easy viewing & munging with dplyr.
# This imports what's currently on the master branch.
# d <- import("https://raw.githubusercontent.com/stephenturner/data538/master/daily-show-guests/daily_show_guests.csv", setclass="tbl_df")
# Let's be safe and import at the commit used to generate this post
d <- import("https://raw.githubusercontent.com/stephenturner/data538/8c71e8cd8ecc550bbcbde51db564fae7bedab746/daily-show-guests/daily_show_guests.csv", setclass="tbl_df")
names(d) <- c("year", "occupation", "date", "group", "guest")
Guests per year:
d %>%
group_by(year) %>%
summarize(n=n()) %>%
ggplot(aes(factor(year), n)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ggtitle("Guests per year")
Guests per year, colored by grouped occupation
d %>%
group_by(group, year) %>%
summarize(n=n()) %>%
ggplot(aes(factor(year), n)) +
geom_bar(stat="identity", aes(fill=group)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ggtitle("Guests per year")
Total number of guests by grouped occupation:
d %>%
group_by(group) %>%
summarize(n=n()) %>%
arrange(desc(n)) %>%
na.omit %>%
ggplot(aes(reorder(group, n), n)) +
geom_bar(stat="identity") +
coord_flip() +
ggtitle("Guests by grouped occupation")
Guests by grouped occupation over time (top 5 only, omitting 2015):
d %>%
group_by(group, year) %>%
summarize(n=n()) %>%
summarize(m=mean(n)) %>%
arrange(desc(m)) %>%
filter(row_number()<=5) %>%
select(-m) %>%
inner_join(d, by="group") %>%
group_by(group, year) %>%
summarize(n=n()) %>%
filter(year<2015) %>%
ggplot(aes(year, n)) + geom_line(aes(col=group), lwd=2) +
ggtitle("Guests by grouped occupation over time")
Repeat guests:
d %>%
group_by(guest) %>%
summarize(n=n()) %>%
arrange(desc(n)) %>%
filter(n>=5) %>%
ggplot(aes(reorder(guest, n), n)) +
geom_bar(stat="identity") +
coord_flip() +
ggtitle("Repeat guests")
Total number of guests by occupation (at least 4 appearances required), colored by grouped occupation:
d %>%
mutate(occupation=tolower(occupation)) %>%
group_by(group, occupation) %>%
summarize(n=n()) %>%
ungroup %>%
arrange(desc(n)) %>%
na.omit %>%
filter(n>=4) %>%
ggplot(aes(reorder(occupation, n), n)) +
geom_bar(stat="identity", aes(fill=group)) +
# scale_y_log10() +
coord_flip() +
ggtitle("Guests by occupation")
There are still plenty of lingering data issues. Sometimes when there are multiple guests on the show, there are two entries, each line showing (“guest 1 and guest 2”), with the occupation/group being split on multiple lines. For instance, both Adam Sandler and Chris Rock were on the show on June 24, 2010. There are two entries, both entries containing both guests’ names, but separate entries for the occupation/group. You’ll also see that Jon Favreau is listed twice as an “actor”, once as an occupation “speechwriter” under the group “Political Aide.”
d %>%
group_by(guest) %>%
summarize(ngroups=n_distinct(group)) %>%
filter(ngroups>1) %>%
select(-ngroups) %>%
inner_join(d, by="guest") %>%
arrange(year, guest) %>%
kable
guest | year | occupation | date | group |
---|---|---|---|---|
Bruce McCulloch and Mark McKinney | 1999 | actor | 9/30/99 | Acting |
Bruce McCulloch and Mark McKinney | 1999 | comedian | 9/30/99 | Comedy |
Frank DeCaro’s Oscar Special, John Larroquette | 1999 | writer | 3/17/99 | Media |
Frank DeCaro’s Oscar Special, John Larroquette | 1999 | actor | 3/17/99 | Acting |
Hootie & the Blowfish, Billy Crystal | 1999 | rock band | 3/11/99 | Musician |
Hootie & the Blowfish, Billy Crystal | 1999 | actor | 3/11/99 | Acting |
Jimmy Kimmel & Adam Carolla | 2000 | television host | 6/14/00 | Media |
Jimmy Kimmel & Adam Carolla | 2000 | comedian | 6/14/00 | Comedy |
Posh Spice & Baby Spice | 2000 | businesswoman | 10/25/00 | Business |
Posh Spice & Baby Spice | 2000 | singer | 10/25/00 | Musician |
Robert Reich and Ben Stein | 2000 | Former United States Secretary of Labor | 8/4/00 | Politician |
Robert Reich and Ben Stein | 2000 | writer | 8/4/00 | Media |
Jon Favreau | 2001 | actor | 7/16/01 | Acting |
David Cross Bob Odenkirk | 2002 | stand-up comedian | 6/20/02 | Comedy |
David Cross Bob Odenkirk | 2002 | actor | 6/20/02 | Acting |
Jon Favreau | 2002 | actor | 4/3/02 | Acting |
William Weld & Al Sharpton | 2004 | former govrnor of masssachusetts | 11/2/04 | Politician |
William Weld & Al Sharpton | 2004 | minister | 11/2/04 | Clergy |
Dr. Mustafa Barghouti and Anna Baltzer | 2009 | activist | 10/28/09 | Advocacy |
Dr. Mustafa Barghouti and Anna Baltzer | 2009 | public speaker | 10/28/09 | Misc |
Adam Sandler and Chris Rock | 2010 | actor | 6/24/10 | Acting |
Adam Sandler and Chris Rock | 2010 | comedian | 6/24/10 | Comedy |
Katie Dellamaggiore and Pobo Efekoro | 2012 | film director | 11/8/12 | Media |
Katie Dellamaggiore and Pobo Efekoro | 2012 | chess player | 11/8/12 | Misc |
Warren Buffett and Carol Loomis | 2012 | business magnate | 11/27/12 | Business |
Warren Buffett and Carol Loomis | 2012 | journalist | 11/27/12 | Media |
Zach the Erect“” Galifianakis & Will Ferrell“” | 2012 | Stand-up comedian | 7/26/12 | Comedy |
Zach the Erect“” Galifianakis & Will Ferrell“” | 2012 | actor | 7/26/12 | Acting |
Bob Odenkirk & David Cross | 2013 | actor | 9/11/13 | Acting |
Bob Odenkirk & David Cross | 2013 | stand-up comedian | 9/11/13 | Comedy |
Jon Favreau | 2013 | speechwriter | 6/5/13 | Political Aide |
Amy Yates Wuelfing & Gibby Haynes | 2014 | author | 3/25/14 | Media |
Amy Yates Wuelfing & Gibby Haynes | 2014 | musician | 3/25/14 | Musician |
Bruce Springsteen & Frank Caruso | 2014 | musician | 11/10/14 | Musician |
Bruce Springsteen & Frank Caruso | 2014 | illustrator | 11/10/14 | Media |
Kathryn Bigelow & Juan Zarate | 2014 | filmmaker | 12/9/14 | Media |
Kathryn Bigelow & Juan Zarate | 2014 | white house official | 12/9/14 | Political Aide |
Maziar Bahari & Gael Garcí_a Bernal | 2014 | Journalist | 11/13/14 | Media |
Maziar Bahari & Gael Garcí_a Bernal | 2014 | film actor | 11/13/14 | Acting |
You’ll also see special broadcasts (e.g., war or election coverage) and other shows where there was no guest. This is often represented differently (none
, None
, (None)
, No Guest
, No guest
, (no guest)
, etc.)
d %>% filter(is.na(group)) %>% kable
year | occupation | date | group | guest |
---|---|---|---|---|
1999 | NA | 12/15/99 | NA | Greatest Millennium Special |
1999 | NA | 7/21/99 | NA | Third Anniversary Special |
1999 | NA | 8/30/99 | NA | The Daily Show Summer Spectacular |
2000 | NA | 11/20/00 | NA | Tales of Survival with Vance DeGeneres |
2000 | NA | 12/13/00 | NA | no guest |
2000 | NA | 7/19/00 | NA | Fourth Anniversary Special |
2000 | NA | 7/31/00 | NA | Campaign Trail to the Road to the White House Mo Rocca, Vance DeGeneres |
2001 | NA | 5/2/01 | NA | No guest |
2002 | NA | 10/14/02 | NA | Road to Washington Special |
2002 | NA | 5/23/02 | NA | Matt Walsh Goes To Hawaii |
2003 | NA | 11/24/03 | NA | Who are the Daily Show? Special |
2003 | NA | 5/26/03 | NA | Iraq - A Look Baq (or how we learned to stop reporting and love the war) |
2003 | NA | 5/26/03 | NA | Iraq - A Look Baq (or how we learned to stop reporting and love the war) |
2003 | NA | 6/8/03 | NA | Looking Beyond The Show |
2003 | - | 8/14/03 | NA | Again, A Look Back |
2003 | NA | 9/1/03 | NA | I’m a Correspondent, Please Don’t Fire Me! |
2003 | NA | 9/11/03 | NA | No Guest |
2004 | NA | 7/29/04 | NA | None |
2004 | NA | 7/30/04 | NA | None |
2004 | 0 | 8/30/04 | NA | (None) |
2006 | NA | 11/1/06 | NA | None |
2007 | 0 | 3/8/07 | NA | John Bambenek |
2008 | NA | 11/4/08 | NA | Indecision 2008 Live Election Night Special |
2008 | 0 | 8/29/08 | NA | (no guest) |
2008 | 0 | 9/5/08 | NA | (no guest) |
2010 | NA | 10/28/10 | NA | none |
2012 | NA | 11/6/12 | NA | Election Night: This Ends Now |
2012 | NA | 3/15/12 | NA | None |
2012 | NA | 8/31/12 | NA | none |
2012 | NA | 9/7/12 | NA | none |
2013 | NA | 6/6/13 | NA | No Guest |
Also, there are 399 distinct occupation listings, but only 336 if you ignore case (e.g., entries for AUTHOR
, Author
, and author
). There are also lots of similar occupations that should probably be grouped (e.g., things like “former u.s. congressman” and “former u.s. representative,” or “author,” “author of novels,” etc.).
mutate(d, occlow=tolower(occupation)) %>%
group_by(occlow) %>%
summarize(n=n_distinct(occupation)) %>%
filter(n>1) %>%
inner_join(mutate(d, occlow=tolower(occupation)), by="occlow") %>%
group_by(occupation) %>%
summarize(n=n()) %>%
arrange(tolower(occupation)) %>%
kable
occupation | n |
---|---|
Academic | 3 |
academic | 3 |
Adviser | 2 |
adviser | 2 |
Astronaut | 1 |
astronaut | 1 |
Attorney | 2 |
attorney | 4 |
AUTHOR | 2 |
Author | 48 |
author | 102 |
Baseball player | 7 |
baseball player | 1 |
Basketball player | 3 |
basketball player | 9 |
Broadcaster | 13 |
broadcaster | 5 |
Business magnate | 2 |
business magnate | 4 |
Businessman | 2 |
businessman | 3 |
Businesswoman | 1 |
businesswoman | 1 |
Chef | 2 |
chef | 3 |
Columnist | 8 |
columnist | 5 |
Comedian | 39 |
comedian | 64 |
Comic | 1 |
comic | 1 |
Commentator | 4 |
commentator | 20 |
Consultant | 6 |
consultant | 2 |
Correspondent | 2 |
correspondent | 4 |
Diplomat | 4 |
diplomat | 6 |
Director | 2 |
director | 7 |
Economist | 3 |
economist | 14 |
Editor | 2 |
editor | 17 |
Entrepreneur | 2 |
entrepreneur | 1 |
Executive | 1 |
executive | 1 |
Film actor | 9 |
film actor | 10 |
Film actress | 9 |
film actress | 12 |
Film director | 10 |
film director | 14 |
Filmmaker | 11 |
filmmaker | 4 |
Former American senator | 1 |
former american senator | 3 |
Former British Prime Minister | 2 |
former british prime minister | 1 |
Former Governor of Arkansas | 8 |
former governor of arkansas | 1 |
Former Governor of New Jersey | 1 |
former governor of new jersey | 3 |
Former White House Press Secretary | 4 |
former white house press secretary | 8 |
Guitarist | 2 |
guitarist | 1 |
Historian | 11 |
historian | 11 |
JOURNALIST | 1 |
Journalist | 72 |
journalist | 180 |
Law professor | 1 |
law professor | 3 |
Lawyer | 5 |
lawyer | 14 |
Model | 3 |
model | 6 |
Musician | 5 |
musician | 14 |
Novelist | 4 |
novelist | 2 |
Photojournalist | 1 |
photojournalist | 1 |
Political figure | 1 |
political figure | 8 |
Political satirist | 3 |
political satirist | 1 |
Political Scientist | 2 |
political scientist | 4 |
Professional Wrestler | 2 |
professional wrestler | 1 |
Professor | 15 |
professor | 22 |
Rapper | 2 |
rapper | 8 |
Reporter | 8 |
reporter | 2 |
Rock band | 2 |
rock band | 12 |
Scholar | 1 |
scholar | 2 |
Screenwriter | 2 |
screenwriter | 4 |
Singer | 9 |
singer | 14 |
Singer-songwriter | 14 |
singer-songwriter | 19 |
Soccer player | 1 |
soccer player | 2 |
Stand-up comedian | 18 |
stand-up comedian | 26 |
Surgeon | 1 |
surgeon | 2 |
television Actor | 2 |
television actor | 1 |
television Personality | 2 |
television personality | 11 |
United States Senator | 10 |
united states senator | 5 |
Writer | 22 |
writer | 30 |