Friends Project - Text Mining and Web Scraping



Introduction



“Friends” is an American television series created by David Crane and Marta Kauffman, aired from 1994 to 2004. It follows a group of friends, consisting of Ross, Rachel, Chandler, Monica, Joey and Phoebe, who live in New York and go through the ups and downs of adult life together.

The series is interesting to analyze for several reasons. First of all, it was a huge success with the audience, who identified with the characters and their stories. It also received critical acclaim for its acting performances, humorous writing and universal themes.

In addition, “Friends” was one of the first shows to realistically address issues such as sexuality, diversity and intergenerational friendship in a fun and accessible way.

Finally, the series had a significant cultural impact, influencing the fashion, music, and popular language of the time, as well as the way sitcoms have been produced and broadcast since.

In sum, an analysis of “Friends” would provide insight into the impact of popular culture on society, as well as the universal and timeless themes of friendship, love, and personal growth.

In order to do the analysis we have been provided with the complete transcript of all seasons of the series “Friends”.

- The first objective is to clean and prepare the data to produce one or more datasets for further analysis.

- The second objective is to demonstrate your ability to extract information from a text.

To answer this two objectives we used the library :

library(reticulate)
library(tidyverse)
library(tidytext)
library(dplyr)
library(kableExtra)

library(rvest)
library(xml2)
library(stringr)

library(friends)

library(reshape2)
library(SnowballC)
library(tm)
library(wordcloud)

Cleaning and Data preparation

Titles
The One Where Monica Gets a New Roommate (The Pilot-The Uncut Version)
The One With the Sonogram at the End
The One With the Thumb
The One With George Stephanopoulos
The One With the East German Laundry Detergent
The One With the Butt
The One With the Blackout





The first step was to retrieve the titles of the different episode of the show. After running some code this is what we get :








Then the next step was to retrieve de text of the shows. The output is the following after some cleaning :

ID Character Text Interaction Season Episode
11 NA Central Perk, Chandler, Joey, Phoebe, and Monica are there. 1 1
11 MONICA There’s nothing to tell! He’s just some guy I work with! 1 1
11 JOEY C’mon, you’re going out with the guy! There’s gotta be something wrong with him! 1 1
11 CHANDLER All right Joey, be nice.  So does he have a hump? A hump and a hairpiece? 1 1
11 PHOEBE Wait, does he eat chalk? 1 1
11 NA They all stare, bemused. 1 1
11 PHOEBE Just, ’cause, I don’t want her to go through what I went through with Carl- oh! 1 1















Authors Season Episode ID
Marta Kauffman & David Crane 1 1 11
Marta Kauffman & David Crane 1 2 12
Jeffrey Astrof & Mike Sikowitz. 1 3 13
Alexa Junge 1 4 14
Jeff Greenstein & Jeff Strauss 1 5 15
Adam Chase & Ira Ungerleider 1 6 16
Jeffrey Astrof and Mike Sikowitz. 1 7 17




After that we retrieve the authors of the shows. The output is the following after some cleaning :











Finally we enriched the data with the library friends. At the end it look likes this :

ID ID2 title season episode scene directed_by written_by air_date us_views_millions imdb_rating speaker text utterance
10110 101 The One After Joey and Rachel Kiss 10 1 1 Kevin S. Bright Andrew Reich & Ted Cohen 2003-09-25 24.54 8.5 Scene Directions [Scene: Barbados, Monica and Chandler’s Room. They both enter from Ross’s room. Monica still has her big, frizzy hair.] 0
10111 101 The One After Joey and Rachel Kiss 10 1 1 Kevin S. Bright Andrew Reich & Ted Cohen 2003-09-25 24.54 8.5 Monica Oh, the way you crushed Mike at ping pong was such a turn-on.You wanna…? 1
10112 101 The One After Joey and Rachel Kiss 10 1 1 Kevin S. Bright Andrew Reich & Ted Cohen 2003-09-25 24.54 8.5 Chandler You know, I’d love to, but I’m a little tired. 2
10113 101 The One After Joey and Rachel Kiss 10 1 1 Kevin S. Bright Andrew Reich & Ted Cohen 2003-09-25 24.54 8.5 Monica I’ll put a pillowcase over my head. 3
10114 101 The One After Joey and Rachel Kiss 10 1 1 Kevin S. Bright Andrew Reich & Ted Cohen 2003-09-25 24.54 8.5 Chandler You’re on! 4
10115 101 The One After Joey and Rachel Kiss 10 1 1 Kevin S. Bright Andrew Reich & Ted Cohen 2003-09-25 24.54 8.5 Phoebe Hey! 5
10116 101 The One After Joey and Rachel Kiss 10 1 1 Kevin S. Bright Andrew Reich & Ted Cohen 2003-09-25 24.54 8.5 Monica What’s up? 6

Analysis

General Analysis

We firt did a graph to have an overview of the data regarding the main characters.


The graph showing the number of appearances of the main characters of Friends highlights that Rachel is the character who appears the most often, while Phoebe is the one who appears the least often. This may be due to several factors, such as the popularity of Jennifer Aniston (the actress who plays Rachel), as well as the central place that the character of Rachel occupies in the plots of the show.

However, this analysis of the number of interventions does not necessarily reflect the importance or the complexity of the character. Indeed, Phoebe may have fewer interventions, but be a key character in the plot in certain situations. Similarly, Joey, who has fewer interventions than Ross, can nonetheless be considered an essential character in the series thanks to his humor and charisma.

In sum, although the graph highlights differences in the number of interventions of the Friends characters, it is important to take into account other factors to evaluate their importance and impact on the story of the show.


Then we implement the second character to have a first comparison with the main ones.


By including the secondary characters Janice, Will and Richard in the analysis, we can see that the number of interventions does not necessarily reflect the importance or popularity of the character. Indeed, characters like Janice and Richard had a significant impact on the plot and the dynamics between the main characters, despite a relatively low number of interventions.


Finally, we made a word cloud to have a global view of what are the principal words used in the show.



The Friends word cloud highlights the most frequently used words in the show. The words “guys,” “god,” and “talk” occur most often, which may indicate the importance of friendship between the characters (“guys”), references to religion (“god”), dialogue and conversation (“talk”) in the series. However, this analysis should not be taken as an exhaustive representation of the series and its themes.

Episodes

Episodes Analysis


The goal here is to analyse the episode of the shows thanks to enrichment we’ve did before. Indeed, it allow us to have some data on the IMB rating for the episode.


Analysis of the graph showing the average ratings for each season of the Friends series shows that season 5 and season 10 were the highest rated, perhaps due to well-constructed storylines and strong emotional moments. On the other hand, season 1 and season 9 were the lowest rated, perhaps due to a phase of character development for the first season and some viewer fatigue for the ninth season.



Analysis of the graph showing the number of views per million for each season of the Friends series highlights that season 2 was the most viewed, perhaps due to the increase in popularity of the series after the first season. On the other hand, season 7 was the least viewed, perhaps due to viewer fatigue after seven seasons or the impact of competition from other shows at the same time.Finally, Season 7 aired after the September 11, 2001 terrorist attacks, which had an impact on viewers’ viewing habits and interests.



Analysis of the heatmap showing the average number of votes for each episode of Friends reveals that the first episodes of each season are generally well rated, which can be explained by the initial enthusiasm of the viewers for the new season. However, it is interesting to note that the 20th episodes seem to be less well rated overall, perhaps due to a loss of plot or viewer interest towards the end of the season.In addition to the reasons mentioned above, episodes 20 of a season may also be rated poorly because of the quality of the plot or the performance of the actors. Writers may find it difficult to keep viewers interested throughout a season, and this may be reflected in the quality of writing in some episodes. Similarly, actors may tire of playing their roles by the end of a season, which can result in less compelling performances. In addition, some episodes may air during times of the year when viewers are less likely to watch television, which can impact ratings.



Analysis of the graph showing the proportion of main character appearances for all seasons of Friends reveals that the main characters all have relatively similar levels of participation throughout the series. However, it can be noted that Rachel, Ross and Monica tend to have slightly more appearances than the other main characters, while Phoebe and Joey tend to be slightly less present. This can be explained by the fact that Rachel, Ross and Monica are often at the center of the main plots of the series, while Phoebe and Joey often have sub-plots of their own.

Characters

Emotional Analyis


The goal here is to bring an analysis more based on the characters and the emotions expressed and transmitted.



An analysis of the most used words by the characters in Friends reveals some interesting differences in the use of language by each of the characters. The fact that “god” is the most used word by all the characters except Joey may suggest that the other characters have a more complex relationship with religion or spirituality than Joey. The fact that “guy” is the word most used by Joey may reflect his more laid-back and friendly personality. Finally, the fact that “friend” is the word used least by Phoebe and Joey is interesting because it is the title of the show, which may indicate that these characters do not view the other characters as just “friends” but rather as a chosen family.



Analysis of the Friends word cloud reveals that the characters often use positive words such as “well” and “good”, reflecting the humor and optimism that characterize the show. However, the fact that the most used negative word is “sorry” may indicate that the characters often have disagreements or misunderstandings with each other, but they always try to fix things and make up, which reinforces the theme of friendship and togetherness in the show.



This graph of the most frequent feelings in Friends highlights the importance of humor, optimism, and friendliness in the series. The infrequency of negative feelings such as anger and disgust suggests that the show focuses more on positive moments and friendships, which is part of its universal appeal to viewers.



This graph is very interesting because it allows us to analyze the association of the most frequent words with the different feelings expressed in the series Friends. For example, we can see that the words “love”, “cute” and “happy” are often associated with the feeling of joy and happiness, while the words “sorry”, “stupid” and “hate” are more frequently associated with negative feelings like sadness or anger. This in-depth analysis allows us to better understand the general tone of the series and the emotions it seeks to convey to viewers.



This graph is interesting because it allows us to see the frequency of different feelings expressed by each character in the Friends series. The finding that Rachel is the most positive character is interesting because it may indicate that her character is associated with happier and more joyful moments in the show. This may also reflect Rachel’s personality and attitude, which is generally optimistic and upbeat throughout the series. This analysis can be helpful in better understanding the dynamics between the various characters and how they contribute to the overall atmosphere of the series. Compared to Rachel, the other characters have different feeling profiles. For example, Joey is more associated with the feeling of anticipation, which may reflect his desire to succeed as an actor and his more impulsive approach to life. Chandler is more associated with the feeling of sadness, which may reflect his more sarcastic persona and his propensity to use humor to mask his emotions. Ross is more associated with anger, which may reflect his more irritable character and propensity to be jealous. By comparing the different feeling profiles, we can better understand the personalities and dynamics of the characters throughout the series.



This shift in feelings may reflect a desire by the show’s creators to maintain an overall positive mood despite the twists and turns and difficulties the characters face. The decrease in positive emotion in the later seasons may also be related to the fact that the series is getting closer to its conclusion and the characters are facing greater and greater challenges. The general decrease in emotion over the seasons may also be a consequence of the evolution of the characters and their relationships, which often had their ups and downs. For example, this evolution of feelings over the seasons can reflect the evolution of the characters and their lives. At the beginning of the series, they are young, carefree and life is good to them, which explains the dominance of positive emotion. As the seasons progress, they grow up, face challenges and problems, which may explain the decrease in positive emotion. At the same time, the decrease in other emotions may reflect a more global evolution of the series, from a light comedy to a more serious and mature tone.

Conclusion


After analyzing different graphs about the Friends series, several trends and characteristics were highlighted.

First, it was found that Rachel is the most present main character throughout the series, while Phoebe is the least present character. However, secondary characters such as Janice, Will and Richard all have significant appearances.

As far as seasons are concerned, seasons 5 and 10 were the most rated, while seasons 1 and 9 had the least votes. Season 2 was the most watched, with over 30 million views, while season 7 was the least watched, with less than 22 million views. Episodes 20 of each season were also consistently low rated, perhaps due to their positioning at the end of the season.

In terms of the words most used by the characters, “god” is the word most used by all the main characters, except Joey, who uses “guy” instead. The word least used by Phoebe and Joey is “friend”. In addition, the most used negative word in the show is “sorry”, while the most used positive words are “well” and “good”.

The sentiment analysis also showed that the most frequent feelings in the series are positivity and anticipation, while anger and disgust are the least frequent. Rachel is the most positive character in the series, while Chandler and Ross are the most negative.

Finally, the analysis of feelings over the seasons shows that positivity decreases slightly in the later seasons, while the other emotions all decrease. Negative emotion remains relatively constant throughout the series, with a slight increase towards the end.

In conclusion, the analysis of different graphs allows for a better understanding of certain aspects of the Friends series, such as the presence of the characters, the most popular seasons, the most used words and the feelings expressed. The results show that the series is mostly positive and anticipatory, with a slight decrease in positivity towards the end of the series. These analyses can be useful in understanding what contributed to the popularity and longevity of the Friends series.