Main Characters
First look at the data
The main characters of the series Friends are Chandler, Joey, Monica, Phoebe, Rachel and Ross. It is on them that we will now focus.
First we will see who is saying the most words in the whole series.
Speaker Occurencies
Rachel 19531
Ross 19194
Joey 18171
Chandler 17811
Monica 17434
Phoebe 15848
We notice that it is Rachel and Ross who pronounce the most words in the series with nearly 20,000 words each, while Phoebe in last position only pronounces 15,000.
Not surprisingly, Rachel says the most words because, as her character says, she talks a lot and sometimes fills her stress by talking.
However it is also interesting to see who has the most unique words to also see who is the character who has the most diverse vocabulary in the series.
Speaker Occurencies
Chandler 3216
Ross 3191
Joey 3004
Phoebe 2878
Monica 2859
Rachel 2809
In this game, Chandler has the most varied vocabulary, while
Rachel, the character with the most words spoken in the series, is also
the main character with the least varied vocabulary, with 2,800 unique
words.
Again, this fits well with the character of Chandler, who sometimes plays the role of a sarcastic and erudite person
Note that Ross, the second character with the most words, is also the
second character with the most unique words.
Now it’s time to observe that they are the most pronounced words in the series.
word_cloud_minor_characters
The top 20 words spoken by the main characters is not very useful to be able to deduce something.
The word “oh” is spoken more than 3500 times, and the top 20 is mainly composed of simple words, the first names of the characters reflecting the simple nature of the series.
As mentioned before, the format of the series and the character
typology encourages short answers and a lot of reaction, so not very
surprising
Following this top 20 not very clarifying on the words used, a list of words was created to remove them from the analysis. These are short expressions, used mainly orally, as well as the names of the main characters.
['oh', 'yeah', 'okay', 'hey', "ok", "hi", "ross", "uh","joey","chandler","monica", "got", "phoebe", "rachel", "one", "yes"]
Here is a word cloud that is exempted from this word list :
This wordcloud is once again quite simple in the words they contain but represents the Friends series well, namely a slash of life series, which has no specific theme except the peregrinations of a group of friends.
Bigram
Following this first inconclusive point regarding unique words. We will turn to Bigram, a combination of two words to give us more context.
In the same way as for single words, a list of words has been created to give a more convincing result.
In addition, bigrams with the same characters twice have been removed from the visualization (eg: “Yeah Yeah”, “Well Well”).
['oh', 'yeah', 'hi', 'hey', 'okay', 'uh', 'huh', 'whoa']
We find here more interesting results with the Bigram “let go” pronounced more than 80 times which must certainly be “Let’s go” but whose S and the apostrophe were removed during the normalization of the dataset.
Sentiment Analysis
Following the research of the bigram which showed the rather neutral character of the series from the increased repetition of basic expression. We will focus on the feeling of the words present in the script For this we have created a new column which by the VADER algorithm will assign a coefficient to the bigram. This coefficient between -1 and 1 will make it possible to attribute a positive, negative or neutral feeling to the Bigram
This bar chart shows us the importance of neutrality on all the
bigrams of the main characters in the series, accentuating the universal
character of the series. The low rate of negativity and even positivity
shows the lack of divisive words in the series also supporting the
success of the series which does not take risks with the text it uses.
It is also interesting to have more granularity on this feeling and to observe what happens with the split of the main characters.
The result is quite disturbing since all the main characters have sentiment rates in the same proportions. Namely 60% neutrality, 10% negativity and 30% positivity. The main characters are written in the same way regarding their feelings, being quite smooth due to their significant neutrality of the bigrams employed.
Enrichment
Following the text analysis that we have just detailed, we decided to enrich the dataframe we had to explore other avenues of analysis. With a dataframe listing the imdb ratings of the episodes we wanted to observe possible correlations between the episode rating and the number of words spoken (from the main characters).
We observe two trends here, namely that the episodes of season nine exceeding 1000 words (the average of which is around 500 words) were less appreciated by the public. Similarly, Season 10 episodes exceeding 1000 spoken words have slightly higher ratings which may also correspond to a judgment may be altered given the end of the series.