When I first started blogging I decided to collect some data about the TV Show Friends. I ended up writing about “the frequency of characters’ shared plotlines, or character groupings, throughout the span of the entire show.” However, I collected this data by manually going through the episodes and recording the relevant groupings. Now, I return to this topic with new data obtained by web scraping the television show scripts. Giora Simchoni just recently put out a post on this and I am seeking to use his structure for scraping and go from there. In short, I am using Giora’s data and making new visuals from his work.

Data Scraping

First, I scrape the data using Giora’s code verbatim in order to yield personLines_df. Once I have followed Giora’s code, I can manipulate the resulting dataframe in order to make my own visuals! Quick note here: Episodes 199 (Season 9, ep 15) and 203 (Season 9, ep 19) are missing from this dataset. (So, this gives data on 226 episodes. Two part episodes are included as one since they have a single script.)

Lines spoken over the course of the show

First, I make my own version of the bar chart that Giora has in his blog post, illustrating the total number of lines for each of the 6 main characters.

#in eps 32-33 abbreviations are used for some of the characters
#let's standardize
personLines_df$person[personLines_df$person == "chan" | personLines_df$person == "chandler "] <- "chandler"
personLines_df$person[personLines_df$person == "phoe" | personLines_df$person == "phoebe "] <- "phoebe"
personLines_df$person[personLines_df$person == "mnca" | personLines_df$person == "monica "] <- "monica"
personLines_df$person[personLines_df$person == "rach" | personLines_df$person == "rachel "] <- "rachel"
#make all other characters "other"
personLines_df$person[!(personLines_df$person == "monica" | personLines_df$person == "rachel" | personLines_df$person == "joey" |personLines_df$person == "phoebe" |personLines_df$person == "ross" |personLines_df$person == "chandler" )] <- "other"
#get dataframe with totals
total_lines <- personLines_df %>%
  count(person) %>%

Now, I can plot my barchart. So, first, I define my custom theme, as per usual.

#Load more libraries
library(ggplot2);library(ggrepel); library(extrafont); library(ggthemes);library(reshape);library(grid);
#Define theme for my visuals
my_theme <- function() {
  # Define colors for the chart
  palette <- brewer.pal("Greys", n=9)
  color.background = palette[2]
  color.grid.major = palette[4]
  color.panel = palette[3]
  color.axis.text = palette[9]
  color.axis.title = palette[9]
  color.title = palette[9]
  # Create basic construction of chart
  theme_bw(base_size=9, base_family="Friends") + 
  #Seems like a good time to use the Friends-TV-show-specific font  
  # Set the entire chart region to a light gray color
  theme(panel.background=element_rect(fill=color.panel, color=color.background)) +
  theme(plot.background=element_rect(fill=color.background, color=color.background)) +
  theme(panel.border=element_rect(color=color.background)) +
  # Format grid
  theme(panel.grid.major=element_line(color=color.grid.major,size=.25)) +
  theme(panel.grid.minor=element_blank()) +
  theme(axis.ticks=element_blank()) +
  # Format legend
  theme(legend.position="right") +
  theme(legend.background = element_rect(fill=color.background)) +
  theme(legend.text = element_text(size=7,color=color.axis.title)) + 
  theme(legend.title = element_text(size=0,face="bold", color=color.axis.title)) + 
  #Format facet labels
  theme(strip.text.x = element_text(size = 8, face="bold"))+
  # Format title and axes labels these and tick marks
  theme(plot.title=element_text(color=color.title, size=18, face="bold", hjust=0)) +
  theme(axis.text.x=element_text(size=6,color=color.axis.text)) +
  theme(axis.text.y=element_text(size=6,color=color.axis.text)) +
  theme(axis.title.x=element_text(size=8,color=color.axis.title, vjust=-1, face="bold")) +
  theme(axis.title.y=element_text(size=8,color=color.axis.title, vjust=1.8, face="bold")) +
  #Format title and facet_wrap title
  theme(strip.text = element_text(size=8), plot.title = element_text(size = 10, face = "bold", colour = "black", vjust = 1, hjust=0.5))+
  # Plot margins
  theme(plot.margin = unit(c(.2, .2, .2, .2), "cm"))

Then I plot the barchart.

ggplot(data=total_lines, aes(x=person, y=n, group=person, color=person, fill=person, label=n)) + scale_fill_brewer(palette = "Set3") + 
  scale_color_brewer(palette = "Set3") +
  my_theme()+ theme(legend.position="none")+ theme(plot.title = element_text( hjust = 0))+theme(axis.text.x=element_text(size=9))+
  geom_text(size = 2, color="black", family="Friends", position = position_stack(vjust = 0.5))+
  labs(x="", y="")+
  ggtitle("The One With All The Quantifiable Friendships", subtitle="Total Lines Spoken by Each Character and Others")

ggsave("total.png", width=7, height=4.5, dpi=900)

Episode by episode

I want to visualize the lines spoken by the 6 main characters and others over the course of the show. Giora does this in a few aggregated ways but I am now going to show this epsiode by episode. So, I first make some manipulations to get a count of lines for each character and episode pairing.

episodes_df$ID <- as.numeric(rownames(episodes_df))
ep_person_lines <- personLines_df %>%
  count(person, episodeTitle, episodeNum, season) 
epi<-episodes_df[c("ID", "episodeTitle")]
ep_lines<-merge(epi, ep_person_lines, by="episodeTitle")

Now I have the data in a form I can graph, so I call ggplot to make some graphs. I went through a number of attempts to get my final product. First, I simply showed the time series of lines spoken for each character all on the same graph.

ggplot(data=ep_lines, aes(x=ID, y=n, group=person, color=person)) + scale_fill_brewer(palette = "Set3") + 
  scale_color_brewer(palette = "Set3") +
  geom_line(size=0.5, alpha=1)+
  my_theme() +theme(plot.title = element_text( hjust = 0))+
  labs(x="Episode Number", y="")+
  ggtitle("The One With All The Quantifiable Friendships", subtitle = "Lines Spoken Over All The Episodes")