RachelDialogue: Emotional Range: Self-Awareness: Popularity: Topics: desk, daddy, honey, assistant, shoe, purse, fashion |
RossDialogue: Emotional Range: Self-Awareness: Popularity: Topics: paleontology, evolution, Hanukkah, fossil, professor, student son, bike |
ChandlerDialogue: Emotional Range: Self-Awareness: : Popularity: Topics: gum, bracelet, closet, sperm, nipple, smoke, gym |
MonicaDialogue: Emotional Range: Self-Awareness: Popularity: Topics: potatoes, adoption, chef, restaurant, kitchen |
PhoebeDialogue: Emotional Range: Self-Awareness: Popularity: Topics: client, songs, massage, guitar, birth, voice, grandma, meat |
JoeyDialogue: Emotional Range: Self-Awareness: Popularity: Topics: script, director, agent, cast, scene, audition, porsche, fridge, duck, soup, pizza |
Dialogue: Number of lines * total words spoken by each character
Emotional range: Standard deviation of each character’s average emotional scores per sentence.
Self-awareness: Percentage of self-referencing words in dialogue.
Popularity: # of times the character was mentioned by another character.
Topics: Words used by one character more than the others. Character must have mentioned the word a minimum of 10 times and be responsible for at least 40% of the times the word is spoken overall.
Aggregating and cleansing the Friends data sets was a challenging task. While most of the html pages used consistent formatting, some of them were subtly different and required different approaches to avoid errors and capture all of the required elements.
The following code does the following data preparation steps. For each HTML file:
Seasons 1-4 have less emotional words than later seasons 5-9.
Season 10 is the exception, ranked the lowest.
Anticipation, joy, and positivity increase throughout the seasons up to season 9.
All emotions decrease in the last season.
| character | dialogue | |
|---|---|---|
| 41541 | Paul | Just relax. Just relax Paul, you’re doing great. She likes you. She’ Maybe, she likes you. She likes you. Y’know why? Because you’re a neat guy. You are the man. You are the man! I still got it. Nice and sexy. You’re just a love machine. I’m just a love machine and I won’t work for nobody but you! Hey bab-y! Showtime. I’m just a love machine, yeah ba-by! |
| 47079 | Joey | Now-now, listen this is just a first draft so’ “We are gathered here today on this joyous occasion to celebrate the special love that Monica and Chandler share.” Eh? “It is a love based on giving and receiving. As well as having and sharing. And the love that they give and have is shared and received. And through this having and giving and sharing and receiving.” “We too can share and love and have and receive.” |
| 8643 | Monica | No, c’mon, we can’t stop, c’mon, we’ve got three more pounds to go. I am the energy train and you are on board. Woo-woo, woo-woo, woo-woo [Chandler walks out of the apartment, leaving Monica] Woo. |
| 7298 | Ross | You deserve to be with someone who appreciates you, and who gets how funny and sweet and amazing, and adorable, and sexy you are, you know? Someone who wakes up every morning thinking “Oh my god, I’m with Rachel”. You know, someone who makes you feel good, the way I am with Julie. Was there a second of all? |
| 34727 | Rachel | Love to love ya baby! Ow! Love to love ya baby! Ow! Love to love ya, baby! Darnit! Ugh. |
| character | dialogue | |
|---|---|---|
| 17694 | Phoebe | Okay. ‘Jingle bitch screwed me over! Go to hell jingle whore! Go to hell Go to hell. Go to hell-hell-hell.’ That’s all I have so far. |
| 17652 | Rachel | Okay, see now, what I just heard: blah-blah-blah, blah-blah-blah-blah-blah, blah-blah-blah, blah, blah. |
| 55900 | Rachel | Horny bitch. No! You’re a horny bitch! Noooo! You’re the horny bitch! No! You’re a horny bitch! |
| 6668 | Ross | Come on, come on. Damnit, damnit, damnit, damnit. This is all your fault. This is supposed to be, like, the greatest day of my life, y’know? My son is being born, and I should be in there, you know, instead of stuck in a closet with you. |
| 39359 | Chandler | I always thought having a heart attack was nature’s way of telling you to die! But you’re not gonna die. I mean, you are going to die, but you’re not gonna die today. I wish I was dead. |
Which seasons are the most emotional for each character?
Load script and format character names to Title case.
If you have Error in gzfile(file, “rb”) : cannot open the connection then put this Rmd file to the folder with ‘friends_script.RData’ and try again. It will work
First we need to change our data frame: replace character name to “Supporting” if it is not from the main six characters
## # A tibble: 100 x 14
## script row_id season episode special episode_2 scene scene_number character
## <chr> <int> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 "[Scene~ 2 01 01 "" "" Cent~ 1 Supporti~
## 2 "Monica~ 3 01 01 "" "" <NA> 1 Monica
## 3 "Joey: ~ 4 01 01 "" "" <NA> 1 Joey
## 4 "Chandl~ 5 01 01 "" "" <NA> 1 Chandler
## 5 "Phoebe~ 6 01 01 "" "" <NA> 1 Phoebe
## 6 "(They ~ 7 01 01 "" "" <NA> 1 Supporti~
## 7 "Phoebe~ 8 01 01 "" "" <NA> 1 Phoebe
## 8 "Monica~ 9 01 01 "" "" <NA> 1 Monica
## 9 "Chandl~ 10 01 01 "" "" <NA> 1 Chandler
## 10 "[Time ~ 11 01 01 "" "" <NA> 1 Supporti~
## # i 90 more rows
## # i 5 more variables: dialogue <chr>, direction <chr>, prevUnseen <chr>,
## # Friend <chr>, master_id <int>
Now we need to add 7 new columns to identify who is talking with whom
## # A tibble: 100 x 21
## script row_id season episode special episode_2 scene scene_number character
## <chr> <int> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 "[Scene~ 2 01 01 "" "" Cent~ 1 Supporti~
## 2 "Monica~ 3 01 01 "" "" <NA> 1 Monica
## 3 "Joey: ~ 4 01 01 "" "" <NA> 1 Joey
## 4 "Chandl~ 5 01 01 "" "" <NA> 1 Chandler
## 5 "Phoebe~ 6 01 01 "" "" <NA> 1 Phoebe
## 6 "(They ~ 7 01 01 "" "" <NA> 1 Supporti~
## 7 "Phoebe~ 8 01 01 "" "" <NA> 1 Phoebe
## 8 "Monica~ 9 01 01 "" "" <NA> 1 Monica
## 9 "Chandl~ 10 01 01 "" "" <NA> 1 Chandler
## 10 "[Time ~ 11 01 01 "" "" <NA> 1 Supporti~
## # i 90 more rows
## # i 12 more variables: dialogue <chr>, direction <chr>, prevUnseen <chr>,
## # Friend <chr>, master_id <int>, talked_with_Rachel <dbl>,
## # talked_with_Phoebe <dbl>, talked_with_Ross <dbl>, talked_with_Joey <dbl>,
## # talked_with_Monica <dbl>, talked_with_Chandler <dbl>,
## # talked_with_Supporting <dbl>
Using the “Talked with” columns we can create “Talked about” columns
This column is looking for mention of names of the main cast and checks whether the speaker is talking about themselves.
## # A tibble: 100 x 27
## script row_id season episode special episode_2 scene scene_number character
## <chr> <int> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 "[Scene~ 2 01 01 "" "" Cent~ 1 Supporti~
## 2 "Monica~ 3 01 01 "" "" <NA> 1 Monica
## 3 "Joey: ~ 4 01 01 "" "" <NA> 1 Joey
## 4 "Chandl~ 5 01 01 "" "" <NA> 1 Chandler
## 5 "Phoebe~ 6 01 01 "" "" <NA> 1 Phoebe
## 6 "(They ~ 7 01 01 "" "" <NA> 1 Supporti~
## 7 "Phoebe~ 8 01 01 "" "" <NA> 1 Phoebe
## 8 "Monica~ 9 01 01 "" "" <NA> 1 Monica
## 9 "Chandl~ 10 01 01 "" "" <NA> 1 Chandler
## 10 "[Time ~ 11 01 01 "" "" <NA> 1 Supporti~
## # i 90 more rows
## # i 18 more variables: dialogue <chr>, direction <chr>, prevUnseen <chr>,
## # Friend <chr>, master_id <int>, talked_with_Rachel <dbl>,
## # talked_with_Phoebe <dbl>, talked_with_Ross <dbl>, talked_with_Joey <dbl>,
## # talked_with_Monica <dbl>, talked_with_Chandler <dbl>,
## # talked_with_Supporting <dbl>, talked_about_Rachel <dbl>,
## # talked_about_Phoebe <dbl>, talked_about_Ross <dbl>, ...
First we begin with the characters that are talked with the most and the least
The graph demonstrates that dialogue is evenly distributed between characters.
Characters talk the most with Rachel and least with Phoebe, however the difference is not big.
Let us create the similar graph for lines spoken about characters.
This graph demonstrates what characters are most and least talked about. It can measure how much attention they receive.
Ross gets the most attention and Phoebe receives the least attention from others. Ross is mentioned more than 2 times more than Phoebe Monica and Rachel receive the similar amount of attention.
Now we know who talked with whom and about whom, so we can analyze this.
The first thing to consider is a heatmap. It is perfect for this analysis because it allows to see frequency by pair of variables.
This graph works well because it shows that character never talked with themselves
This heatmap demonstrates that the most frequent conversations are between: 1. Rachel and Ross (with Ross talking slightly more) 2. Monica and Chandler (with almost equal line amount) 3. Joey and Chandler (with almost equal line amount)
The least frequent pair by dialogue between each other is Rachel and Chandler. They talk 3 times less than Rachel and Ross.
Now let us see the heat map of character talking about each other.
The graph demonstrates that characters talk about others approximately 50 times less than talk with others.
Rachel talks the most about Ross and the least about Phoebe Monica talks the most about Chandler and the least about Phoebe Joey talks the most about Ross and the least about Phoebe Phoebe talks the most about Ross and the least about Joey Ross talks the most about Rachel and the least about Phoebe Chandler talks the most about Monica and the least about Phoebe Supporting cast mention Ross the most and Phoebe the least
Let us see how the amount of dialogue with and about characters changed between seasons
The graph demonstrates that the share of each character stayed similar across seasons. It demonstrates that writing is balanced - no main characters steal the spotlight or disappear in obscurity.
How did the situation change when characers talked about each other?
First, we see that with time characters talk more and more about each other.
In the season 07 Monica stole the spotlight, taking attention from Ross and Rachel, but generally the share of talking about characters stay similar, though the variance is higher compared to talked with.
Finally, the most interesting thing to see is how words spoken with a character differ from words speaking about characters.
To do this, we must first tokenize sentences.
Now as sentences are tokenized, we can do our analysis.
Calculation may take time due to the size of the dataset, so please keep in mind before launching and take patience
Most common phrases when talk about characters
The most popular phrases are similar between “talked with” and “talked about”. Therefore it indicates that the characters’ way of speaking does not change noticeably when the speak to cheracters to when they speak about characters.
What is the most unique to each group based on actions? (F left, M right)
What are the most frequent topics that each group talks about more than the other? (F left, M right)
Verbs that are associated with each group, shown in total usage. (F left, M right)
Men stand .9x more and speak .5x more than women.
Women check 2x more and agree 3.5x more than men.
Women are beautiful and hot, men are nice and bad.
Across the entire series, Rachel had the most lines of dialogue and spoke the most words, while Phoebe had the fewest lines of dialogue and spoke the fewest words, even though her lines were the longest on average. Monica spoke more lines than Phoebe and Joey, but because she had the shortest lines on average, she only spoke about 0.5% more words than Phoebe.
All characters became more verbose in Season 2 compared to Season 1, especially Phoebe. Her average dialogue length of 13.62 words per line that season was the highest observed for any character in any season. In contrast, Monica’s 8.98 words per line in season 1 were the lowest.
| Friend | Lines of dialogue | Average dialogue length | Total words |
|---|---|---|---|
| Phoebe | 7588 | 11.02 | 83638 |
| Joey | 8223 | 10.78 | 88638 |
| Rachel | 9288 | 10.67 | 99064 |
| Ross | 9099 | 10.67 | 97110 |
| Chandler | 8507 | 10.39 | 88416 |
| Monica | 8444 | 9.96 | 84099 |
| season | Chandler | Joey | Monica | Phoebe | Rachel | Ross |
|---|---|---|---|---|---|---|
| 01 | 10.72 | 9.66 | 8.98 | 10.00 | 10.44 | 10.70 |
| 02 | 11.43 | 11.49 | 10.47 | 13.62 | 11.38 | 11.08 |
| 03 | 10.81 | 11.00 | 9.81 | 11.66 | 9.53 | 9.98 |
| 04 | 10.11 | 10.25 | 9.66 | 11.93 | 10.24 | 10.72 |
| 05 | 9.12 | 10.40 | 9.50 | 10.65 | 11.26 | 10.60 |
| 06 | 10.39 | 12.06 | 9.74 | 10.53 | 11.08 | 11.26 |
| 07 | 10.19 | 10.42 | 10.38 | 9.75 | 10.75 | 10.76 |
| 08 | 9.18 | 11.76 | 10.74 | 9.93 | 10.70 | 10.35 |
| 09 | 12.02 | 10.70 | 10.20 | 11.55 | 10.86 | 11.84 |
| 10 | 9.65 | 9.71 | 10.21 | 10.64 | 10.37 | 9.57 |
As we will soon see, many of the lines of dialogue are extremely short, with three or fewer words. As these sentences tend to be very light on content, we can gain a better understand of each character’s value contribution by excluding such short lines. We can call these “meaningful” lines.
With this adjustment, Rachel also had the most meaningful lines of dialogue and spoke the most words in those lines. This time she also had the longest average line length.
Phoebe again had the fewest lines of dialogue, but Monica spoke the fewest words in these lines.
Phoebe’s Season 2 average meaningful dialogue length of 16.7 words per line was again the highest observed for any character in any season. Again, Monica’s 11.57 words per meaningful line in season 1 were again the lowest.
| Friend | Lines of dialogue | Average dialogue length | Total words |
|---|---|---|---|
| Rachel | 6674 | 14.16 | 94492 |
| Ross | 6622 | 14.00 | 92694 |
| Phoebe | 5752 | 13.97 | 80371 |
| Joey | 6187 | 13.74 | 84979 |
| Chandler | 6444 | 13.13 | 84582 |
| Monica | 6286 | 12.74 | 80088 |
| season | Chandler | Joey | Monica | Phoebe | Rachel | Ross |
|---|---|---|---|---|---|---|
| 01 | 13.03 | 12.53 | 11.57 | 12.65 | 13.54 | 13.77 |
| 02 | 13.56 | 14.03 | 13.16 | 16.70 | 14.90 | 14.53 |
| 03 | 13.35 | 13.74 | 12.71 | 14.30 | 13.34 | 13.70 |
| 04 | 13.33 | 13.51 | 12.62 | 15.24 | 14.06 | 14.37 |
| 05 | 12.29 | 13.13 | 12.41 | 13.66 | 14.97 | 14.19 |
| 06 | 13.38 | 15.28 | 12.43 | 13.42 | 14.73 | 14.78 |
| 07 | 13.04 | 13.80 | 13.35 | 12.68 | 14.27 | 13.54 |
| 08 | 11.59 | 14.62 | 13.61 | 12.90 | 14.00 | 13.86 |
| 09 | 14.68 | 13.32 | 12.97 | 14.29 | 13.91 | 14.53 |
| 10 | 12.24 | 12.79 | 12.56 | 13.68 | 13.73 | 12.70 |
Most lines of dialogue in Friends were short. Ross and Rachel tied for the most “long lines” containing 40 or more words. However, Rachel’s long lines averaged 56.5 words (the highest) while Ross’s averaged just 52.53 words (the lowest). Monica had the fewest long lines by a substantial margin, 42% fewest than Ross and Rachel. The average length of these lines was only slightly higher than Ross’s at 52.65 words per line, resulting in 46% fewer words from long lines than Rachel.
| Friend | Lines of dialogue | Average dialogue length | Total words |
|---|---|---|---|
| Rachel | 248 | 56.50 | 14013 |
| Phoebe | 207 | 54.09 | 11196 |
| Chandler | 186 | 53.49 | 9949 |
| Joey | 213 | 52.79 | 11245 |
| Monica | 144 | 52.65 | 7581 |
| Ross | 248 | 52.53 | 13027 |
| season | Chandler | Joey | Monica | Phoebe | Rachel | Ross |
|---|---|---|---|---|---|---|
| 01 | 57.53 | 49.70 | 47.27 | 54.26 | 63.55 | 53.04 |
| 02 | 53.09 | 49.72 | 48.94 | 56.86 | 60.48 | 50.53 |
| 03 | 49.27 | 65.58 | 55.87 | 51.50 | 51.32 | 52.50 |
| 04 | 52.20 | 48.14 | 53.71 | 53.73 | 59.50 | 50.65 |
| 05 | 52.92 | 46.46 | 49.62 | 54.94 | 56.96 | 57.16 |
| 06 | 54.20 | 50.33 | 50.92 | 57.36 | 56.33 | 47.87 |
| 07 | 55.81 | 54.88 | 54.56 | 47.20 | 50.94 | 55.42 |
| 08 | 50.11 | 58.93 | 57.71 | 48.38 | 53.92 | 50.00 |
| 09 | 52.31 | 51.32 | 56.29 | 57.44 | 57.86 | 58.59 |
| 10 | 66.00 | 52.93 | 47.88 | 56.88 | 55.12 | 54.25 |
Here we can visualize the distribution of individual line lengths for each Friend using boxplots. On this scale, the difference in average dialogue length is difficult to discern and outlier lines are emphasized.
Ironically, Monica spoke the single longest line of dialogue in the entire series.
…and the winner for Longest Line of Dialogue goes to…Monica!
(It was the toast she gave at a party celebrating her and Ross’s parents’ 35th wedding anniversary.)
(197 words. Season 8, episode 18)
No, no it’s going to be great. Really! Mom, Dad, when I got married, one of the things that made me sure I could do it was the amazing example the two of you set for me. For that and so many other things I want to say thank you. I know I probably don’t say it enough, but I love you. When I look around this room, I’m-I’m saddened by the thought of those who could not be here with us. Nana, my beloved grandmother who would so want to be here, but she can’t because she’s dead. As is our dog Chi-Chi. I mean look how cute she is. . Was. Do me a favor and pass this to my parents. Remember she’s dead. Okay, her and Nana, gone. Wow! Hey does anybody remember when Debra Winger had to say goodbye to her children in Terms of Endearment? Didn’t see that? No movie fans?! You want to hear something sad? The other day I was watching 60 Minutes these orphans in Romania, who have been so neglected, they were incapable of love. You people are made of stone! Here’s to mom and dad! Whatever!
As expected, most lines of dialogue are short and steadily become less common as the number of words increases.
Unmeaningful.
Somewhat meaningful.
Somewhat more meaningful.
Use bigrams to separate two consecutive words spoken by the main characters.
First we need to make the names more consistent across the main characters
Create bigrams for the main characters
Use trigrams (ngrams, n =3) to separate three consecutive words in the dialogue
creating a network graph visualization
This chunk of code creates a network graph of bigrams and trigrams respectively, with the edges’ transparency representing the frequency of the bigrams and trigrams, and the nodes’ size representing the number of connections of each word in the graph