Overview

Rachel

Dialogue:

Emotional Range:

Self-Awareness:

Popularity:

Topics: desk, daddy, honey, assistant, shoe, purse, fashion

Ross

Dialogue:

Emotional Range:

Self-Awareness:

Popularity:

Topics: paleontology, evolution, Hanukkah, fossil, professor, student son, bike

Chandler

Dialogue:

Emotional Range:

Self-Awareness: :

Popularity:

Topics: gum, bracelet, closet, sperm, nipple, smoke, gym

Monica Phoebe Joey

Monica

Dialogue:

Emotional Range:

Self-Awareness:

Popularity:

Topics: potatoes, adoption, chef, restaurant, kitchen

Phoebe

Dialogue:

Emotional Range:

Self-Awareness:

Popularity:

Topics: client, songs, massage, guitar, birth, voice, grandma, meat

Joey

Dialogue:

Emotional Range:

Self-Awareness:

Popularity:

Topics: script, director, agent, cast, scene, audition, porsche, fridge, duck, soup, pizza

Categories

Dialogue: Number of lines * total words spoken by each character

Emotional range: Standard deviation of each character’s average emotional scores per sentence.

Self-awareness: Percentage of self-referencing words in dialogue.

Popularity: # of times the character was mentioned by another character.

Topics: Words used by one character more than the others. Character must have mentioned the word a minimum of 10 times and be responsible for at least 40% of the times the word is spoken overall.

Cleaning

Overview

Aggregating and cleansing the Friends data sets was a challenging task. While most of the html pages used consistent formatting, some of them were subtly different and required different approaches to avoid errors and capture all of the required elements.

The following code does the following data preparation steps. For each HTML file:

  1. Read in the script using rvest::read_html
  2. Locate and standardize the title
  3. Locate the paragraphs where possible and convert to text
  4. Replace non-ASCII characters used for special apostrophes to standard ASCII apostrophes
  5. Deal with episodes where the paragraphs were between newline characters rather than paragraph markers
  6. Remove rows with NAs or nothing in the script column
  7. Find transcriber(s) where possible
  8. Find writer(s) where possible
  9. Assign scene numbers where possible
  10. Find speaking character for each line of dialogue
  11. Remove unwanted punctuation or other “messy” characters
  12. Add cleaned episode text to data frame containing cleaned text from all episodes
  13. Standardized character field to capture all instances of six main Friends

Sentiment

Total Series Emotion

  • Positivity is the dominant emotion.
  • Positive emotions are ranked higher than negative emotions (anticipation, joy, and trust vs. fear, sadness, anger, surprise, and disgust)

Season Emotional Word Totals

  • Seasons 1-4 have less emotional words than later seasons 5-9.

  • Season 10 is the exception, ranked the lowest.

Types of Emotions Across Seasons

  • Anticipation, joy, and positivity increase throughout the seasons up to season 9.

  • All emotions decrease in the last season.

Most negative and positive sentences in the series

Positive Sentences:

character dialogue
41541 Paul Just relax. Just relax Paul, you’re doing great. She likes you. She’ Maybe, she likes you. She likes you. Y’know why? Because you’re a neat guy. You are the man. You are the man! I still got it. Nice and sexy. You’re just a love machine. I’m just a love machine and I won’t work for nobody but you! Hey bab-y! Showtime. I’m just a love machine, yeah ba-by!
47079 Joey Now-now, listen this is just a first draft so’ “We are gathered here today on this joyous occasion to celebrate the special love that Monica and Chandler share.” Eh? “It is a love based on giving and receiving. As well as having and sharing. And the love that they give and have is shared and received. And through this having and giving and sharing and receiving.” “We too can share and love and have and receive.”
8643 Monica No, c’mon, we can’t stop, c’mon, we’ve got three more pounds to go. I am the energy train and you are on board. Woo-woo, woo-woo, woo-woo [Chandler walks out of the apartment, leaving Monica] Woo.
7298 Ross You deserve to be with someone who appreciates you, and who gets how funny and sweet and amazing, and adorable, and sexy you are, you know? Someone who wakes up every morning thinking “Oh my god, I’m with Rachel”. You know, someone who makes you feel good, the way I am with Julie. Was there a second of all?
34727 Rachel Love to love ya baby! Ow! Love to love ya baby! Ow! Love to love ya, baby! Darnit! Ugh.

Negative Sentences:

character dialogue
17694 Phoebe Okay. ‘Jingle bitch screwed me over! Go to hell jingle whore! Go to hell Go to hell. Go to hell-hell-hell.’ That’s all I have so far.
17652 Rachel Okay, see now, what I just heard: blah-blah-blah, blah-blah-blah-blah-blah, blah-blah-blah, blah, blah.
55900 Rachel Horny bitch. No! You’re a horny bitch! Noooo! You’re the horny bitch! No! You’re a horny bitch!
6668 Ross Come on, come on. Damnit, damnit, damnit, damnit. This is all your fault. This is supposed to be, like, the greatest day of my life, y’know? My son is being born, and I should be in there, you know, instead of stuck in a closet with you.
39359 Chandler I always thought having a heart attack was nature’s way of telling you to die! But you’re not gonna die. I mean, you are going to die, but you’re not gonna die today. I wish I was dead.

Sentiment by character

  • Phoebe has the highest usage of emotional words in all categories.
  • Joey has lower usage in everything but anger and surprise.
  • Ross is always in the bottom half for emotional word usage.
  • Female characters are more positive, joyful, and trusting
  • (Excluding Phoebe) male characters are more angry and surprised.

How do emotions change over time?

Which seasons are the most emotional for each character?

Total Emotions and Range of Emotion by Character

  • Phoebe is the most emotional character and displays the widest range of emotion.
  • Joey uses the least amount of emotional words and has the smallest range of emotion.
  • Female characters use more emotional words than male characters.

Character Interactions

Get script

Load script and format character names to Title case.

If you have Error in gzfile(file, “rb”) : cannot open the connection then put this Rmd file to the folder with ‘friends_script.RData’ and try again. It will work

Transforming table to see with whom and about whom the dialogue was

Filter columns with only main characters

First we need to change our data frame: replace character name to “Supporting” if it is not from the main six characters

## # A tibble: 100 x 14
##    script   row_id season episode special episode_2 scene scene_number character
##    <chr>     <int> <chr>  <chr>   <chr>   <chr>     <chr>        <dbl> <chr>    
##  1 "[Scene~      2 01     01      ""      ""        Cent~            1 Supporti~
##  2 "Monica~      3 01     01      ""      ""        <NA>             1 Monica   
##  3 "Joey: ~      4 01     01      ""      ""        <NA>             1 Joey     
##  4 "Chandl~      5 01     01      ""      ""        <NA>             1 Chandler 
##  5 "Phoebe~      6 01     01      ""      ""        <NA>             1 Phoebe   
##  6 "(They ~      7 01     01      ""      ""        <NA>             1 Supporti~
##  7 "Phoebe~      8 01     01      ""      ""        <NA>             1 Phoebe   
##  8 "Monica~      9 01     01      ""      ""        <NA>             1 Monica   
##  9 "Chandl~     10 01     01      ""      ""        <NA>             1 Chandler 
## 10 "[Time ~     11 01     01      ""      ""        <NA>             1 Supporti~
## # i 90 more rows
## # i 5 more variables: dialogue <chr>, direction <chr>, prevUnseen <chr>,
## #   Friend <chr>, master_id <int>

Add columns “Talked with …”

Now we need to add 7 new columns to identify who is talking with whom

## # A tibble: 100 x 21
##    script   row_id season episode special episode_2 scene scene_number character
##    <chr>     <int> <chr>  <chr>   <chr>   <chr>     <chr>        <dbl> <chr>    
##  1 "[Scene~      2 01     01      ""      ""        Cent~            1 Supporti~
##  2 "Monica~      3 01     01      ""      ""        <NA>             1 Monica   
##  3 "Joey: ~      4 01     01      ""      ""        <NA>             1 Joey     
##  4 "Chandl~      5 01     01      ""      ""        <NA>             1 Chandler 
##  5 "Phoebe~      6 01     01      ""      ""        <NA>             1 Phoebe   
##  6 "(They ~      7 01     01      ""      ""        <NA>             1 Supporti~
##  7 "Phoebe~      8 01     01      ""      ""        <NA>             1 Phoebe   
##  8 "Monica~      9 01     01      ""      ""        <NA>             1 Monica   
##  9 "Chandl~     10 01     01      ""      ""        <NA>             1 Chandler 
## 10 "[Time ~     11 01     01      ""      ""        <NA>             1 Supporti~
## # i 90 more rows
## # i 12 more variables: dialogue <chr>, direction <chr>, prevUnseen <chr>,
## #   Friend <chr>, master_id <int>, talked_with_Rachel <dbl>,
## #   talked_with_Phoebe <dbl>, talked_with_Ross <dbl>, talked_with_Joey <dbl>,
## #   talked_with_Monica <dbl>, talked_with_Chandler <dbl>,
## #   talked_with_Supporting <dbl>

Add columns “Talked about …”

Using the “Talked with” columns we can create “Talked about” columns

This column is looking for mention of names of the main cast and checks whether the speaker is talking about themselves.

## # A tibble: 100 x 27
##    script   row_id season episode special episode_2 scene scene_number character
##    <chr>     <int> <chr>  <chr>   <chr>   <chr>     <chr>        <dbl> <chr>    
##  1 "[Scene~      2 01     01      ""      ""        Cent~            1 Supporti~
##  2 "Monica~      3 01     01      ""      ""        <NA>             1 Monica   
##  3 "Joey: ~      4 01     01      ""      ""        <NA>             1 Joey     
##  4 "Chandl~      5 01     01      ""      ""        <NA>             1 Chandler 
##  5 "Phoebe~      6 01     01      ""      ""        <NA>             1 Phoebe   
##  6 "(They ~      7 01     01      ""      ""        <NA>             1 Supporti~
##  7 "Phoebe~      8 01     01      ""      ""        <NA>             1 Phoebe   
##  8 "Monica~      9 01     01      ""      ""        <NA>             1 Monica   
##  9 "Chandl~     10 01     01      ""      ""        <NA>             1 Chandler 
## 10 "[Time ~     11 01     01      ""      ""        <NA>             1 Supporti~
## # i 90 more rows
## # i 18 more variables: dialogue <chr>, direction <chr>, prevUnseen <chr>,
## #   Friend <chr>, master_id <int>, talked_with_Rachel <dbl>,
## #   talked_with_Phoebe <dbl>, talked_with_Ross <dbl>, talked_with_Joey <dbl>,
## #   talked_with_Monica <dbl>, talked_with_Chandler <dbl>,
## #   talked_with_Supporting <dbl>, talked_about_Rachel <dbl>,
## #   talked_about_Phoebe <dbl>, talked_about_Ross <dbl>, ...

Analytics

Talked about - general view

First we begin with the characters that are talked with the most and the least

The graph demonstrates that dialogue is evenly distributed between characters.

Characters talk the most with Rachel and least with Phoebe, however the difference is not big.

Let us create the similar graph for lines spoken about characters.

This graph demonstrates what characters are most and least talked about. It can measure how much attention they receive.

Ross gets the most attention and Phoebe receives the least attention from others. Ross is mentioned more than 2 times more than Phoebe Monica and Rachel receive the similar amount of attention.

Pairs Heatmaps

Now we know who talked with whom and about whom, so we can analyze this.

The first thing to consider is a heatmap. It is perfect for this analysis because it allows to see frequency by pair of variables.

This graph works well because it shows that character never talked with themselves

This heatmap demonstrates that the most frequent conversations are between: 1. Rachel and Ross (with Ross talking slightly more) 2. Monica and Chandler (with almost equal line amount) 3. Joey and Chandler (with almost equal line amount)

The least frequent pair by dialogue between each other is Rachel and Chandler. They talk 3 times less than Rachel and Ross.

Now let us see the heat map of character talking about each other.

The graph demonstrates that characters talk about others approximately 50 times less than talk with others.

Rachel talks the most about Ross and the least about Phoebe Monica talks the most about Chandler and the least about Phoebe Joey talks the most about Ross and the least about Phoebe Phoebe talks the most about Ross and the least about Joey Ross talks the most about Rachel and the least about Phoebe Chandler talks the most about Monica and the least about Phoebe Supporting cast mention Ross the most and Phoebe the least

Speaked to/about per season

Let us see how the amount of dialogue with and about characters changed between seasons

The graph demonstrates that the share of each character stayed similar across seasons. It demonstrates that writing is balanced - no main characters steal the spotlight or disappear in obscurity.

How did the situation change when characers talked about each other?

First, we see that with time characters talk more and more about each other.

In the season 07 Monica stole the spotlight, taking attention from Ross and Rachel, but generally the share of talking about characters stay similar, though the variance is higher compared to talked with.

Speaking to VS Speaking about

Finally, the most interesting thing to see is how words spoken with a character differ from words speaking about characters.

To do this, we must first tokenize sentences.

Now as sentences are tokenized, we can do our analysis.

Calculation may take time due to the size of the dataset, so please keep in mind before launching and take patience

Most common phrases when talk with characters

Most common phrases when talk about characters

The most popular phrases are similar between “talked with” and “talked about”. Therefore it indicates that the characters’ way of speaking does not change noticeably when the speak to cheracters to when they speak about characters.

Parts of Speech

Character Self-awareness

Nouns - What does each group interact with?

What is the most unique to each group based on actions? (F left, M right)

Nouns - What does each group talk about?

What are the most frequent topics that each group talks about more than the other? (F left, M right)

Verbs - How does each group act?

Verbs that are associated with each group, shown in total usage. (F left, M right)

Men stand .9x more and speak .5x more than women.

Women check 2x more and agree 3.5x more than men.

Adjectives - How are genders described?

Women are beautiful and hot, men are nice and bad.

Verbosity

All lines of dialogue

Across the entire series, Rachel had the most lines of dialogue and spoke the most words, while Phoebe had the fewest lines of dialogue and spoke the fewest words, even though her lines were the longest on average. Monica spoke more lines than Phoebe and Joey, but because she had the shortest lines on average, she only spoke about 0.5% more words than Phoebe.

All characters became more verbose in Season 2 compared to Season 1, especially Phoebe. Her average dialogue length of 13.62 words per line that season was the highest observed for any character in any season. In contrast, Monica’s 8.98 words per line in season 1 were the lowest.

Friends verbosity analysis (all lines of dialogue)
Friend Lines of dialogue Average dialogue length Total words
Phoebe 7588 11.02 83638
Joey 8223 10.78 88638
Rachel 9288 10.67 99064
Ross 9099 10.67 97110
Chandler 8507 10.39 88416
Monica 8444 9.96 84099
Friends verbosity analysis by season (all lines of dialogue)
season Chandler Joey Monica Phoebe Rachel Ross
01 10.72 9.66 8.98 10.00 10.44 10.70
02 11.43 11.49 10.47 13.62 11.38 11.08
03 10.81 11.00 9.81 11.66 9.53 9.98
04 10.11 10.25 9.66 11.93 10.24 10.72
05 9.12 10.40 9.50 10.65 11.26 10.60
06 10.39 12.06 9.74 10.53 11.08 11.26
07 10.19 10.42 10.38 9.75 10.75 10.76
08 9.18 11.76 10.74 9.93 10.70 10.35
09 12.02 10.70 10.20 11.55 10.86 11.84
10 9.65 9.71 10.21 10.64 10.37 9.57

Excluding short lines with three or fewer words (meaningful dialogue)

As we will soon see, many of the lines of dialogue are extremely short, with three or fewer words. As these sentences tend to be very light on content, we can gain a better understand of each character’s value contribution by excluding such short lines. We can call these “meaningful” lines.

With this adjustment, Rachel also had the most meaningful lines of dialogue and spoke the most words in those lines. This time she also had the longest average line length.

Phoebe again had the fewest lines of dialogue, but Monica spoke the fewest words in these lines.

Phoebe’s Season 2 average meaningful dialogue length of 16.7 words per line was again the highest observed for any character in any season. Again, Monica’s 11.57 words per meaningful line in season 1 were again the lowest.

Friends verbosity analysis (excluding lines with three or fewer words)
Friend Lines of dialogue Average dialogue length Total words
Rachel 6674 14.16 94492
Ross 6622 14.00 92694
Phoebe 5752 13.97 80371
Joey 6187 13.74 84979
Chandler 6444 13.13 84582
Monica 6286 12.74 80088
Friends verbosity analysis by season (excluding lines with three or fewer words)
season Chandler Joey Monica Phoebe Rachel Ross
01 13.03 12.53 11.57 12.65 13.54 13.77
02 13.56 14.03 13.16 16.70 14.90 14.53
03 13.35 13.74 12.71 14.30 13.34 13.70
04 13.33 13.51 12.62 15.24 14.06 14.37
05 12.29 13.13 12.41 13.66 14.97 14.19
06 13.38 15.28 12.43 13.42 14.73 14.78
07 13.04 13.80 13.35 12.68 14.27 13.54
08 11.59 14.62 13.61 12.90 14.00 13.86
09 14.68 13.32 12.97 14.29 13.91 14.53
10 12.24 12.79 12.56 13.68 13.73 12.70

Only lines with 40 or more words (long lines)

Most lines of dialogue in Friends were short. Ross and Rachel tied for the most “long lines” containing 40 or more words. However, Rachel’s long lines averaged 56.5 words (the highest) while Ross’s averaged just 52.53 words (the lowest). Monica had the fewest long lines by a substantial margin, 42% fewest than Ross and Rachel. The average length of these lines was only slightly higher than Ross’s at 52.65 words per line, resulting in 46% fewer words from long lines than Rachel.

Friends verbosity analysis (only lines with 40 or more words)
Friend Lines of dialogue Average dialogue length Total words
Rachel 248 56.50 14013
Phoebe 207 54.09 11196
Chandler 186 53.49 9949
Joey 213 52.79 11245
Monica 144 52.65 7581
Ross 248 52.53 13027
Friends verbosity analysis by season (only lines with 40 or more words)
season Chandler Joey Monica Phoebe Rachel Ross
01 57.53 49.70 47.27 54.26 63.55 53.04
02 53.09 49.72 48.94 56.86 60.48 50.53
03 49.27 65.58 55.87 51.50 51.32 52.50
04 52.20 48.14 53.71 53.73 59.50 50.65
05 52.92 46.46 49.62 54.94 56.96 57.16
06 54.20 50.33 50.92 57.36 56.33 47.87
07 55.81 54.88 54.56 47.20 50.94 55.42
08 50.11 58.93 57.71 48.38 53.92 50.00
09 52.31 51.32 56.29 57.44 57.86 58.59
10 66.00 52.93 47.88 56.88 55.12 54.25

Distribution of dialogue line length by Friend

Here we can visualize the distribution of individual line lengths for each Friend using boxplots. On this scale, the difference in average dialogue length is difficult to discern and outlier lines are emphasized.

Ironically, Monica spoke the single longest line of dialogue in the entire series.

Longest individual dialogue line

…and the winner for Longest Line of Dialogue goes to…Monica!

(It was the toast she gave at a party celebrating her and Ross’s parents’ 35th wedding anniversary.)

(197 words. Season 8, episode 18)

No, no it’s going to be great. Really! Mom, Dad, when I got married, one of the things that made me sure I could do it was the amazing example the two of you set for me. For that and so many other things I want to say thank you. I know I probably don’t say it enough, but I love you. When I look around this room, I’m-I’m saddened by the thought of those who could not be here with us. Nana, my beloved grandmother who would so want to be here, but she can’t because she’s dead. As is our dog Chi-Chi. I mean look how cute she is. . Was. Do me a favor and pass this to my parents. Remember she’s dead. Okay, her and Nana, gone. Wow! Hey does anybody remember when Debra Winger had to say goodbye to her children in Terms of Endearment? Didn’t see that? No movie fans?! You want to hear something sad? The other day I was watching 60 Minutes these orphans in Romania, who have been so neglected, they were incapable of love. You people are made of stone! Here’s to mom and dad! Whatever!

Distribution of Friends dialogue lines by word count (all episodes)

As expected, most lines of dialogue are short and steadily become less common as the number of words increases.

Visualization of single-word dialogue lines (all episodes)

Unmeaningful.

Visualization of two-word dialogue lines (all episodes)

Somewhat meaningful.

Visualization of three-word dialogue lines (all episodes)

Somewhat more meaningful.

Bigrams & Trigrams

Bigrams & Tokenization

Use bigrams to separate two consecutive words spoken by the main characters.

First we need to make the names more consistent across the main characters

Create bigrams for the main characters

Trigrams

Use trigrams (ngrams, n =3) to separate three consecutive words in the dialogue

Igraph

creating a network graph visualization

This chunk of code creates a network graph of bigrams and trigrams respectively, with the edges’ transparency representing the frequency of the bigrams and trigrams, and the nodes’ size representing the number of connections of each word in the graph

Character-specific Word Cloud

Visualization of Ross’s Top 100 Bigrams

Visualization of Rachel’s Top 100 Bigrams

Visualization of Monica’s Top 100 Bigrams

Visualization of Phoebe’s Top 100 Bigrams

Visualization of Joey’s Top 100 Bigrams

Visualization of Chandler’s Top 100 Bigrams

Visualization of Ross’s Trigram words

Visualization of Rachel’s Trigram words

Visualization of Monica’s Trigram words

Visualization of Phoebe’s Trigram words

Visualization of Joey’s Trigram words

Visualization of Chandler’s Trigram words