Topic Introduction

Movies have always been a constant and consistent source of entertainment and joy in my life. I attribute a lot of this to my father, whose geekiness has inspired and molded the geek within me. One of our favorite pastimes together, and a fond memory of mine growing up, was watching Sci-Fi and Fantasy movies. This hobby of ours has created positive lasting implications for my life, namely the massive community that a lot of these movies have surrounding them. Meeting someone and getting into long, elaborate talks about the Lord of the Rings series, or what someone’s thoughts are about the new Marvel movie are immediate forms of bonding, and consistently spark what I find as interesting conversation. The topic of this document investigates whether or not this sense of community and detailed conversations about Sci-Fi and Fantasy films that are worth talking about permeates into the IMDB review section.

Data Introduction

I will be comparing movie reviews for very successful and very unsuccessful movies, success determined by IMDB scoring. I have delved into the IMDB review pages, and scraped data from a few movies. The movies that I’ve selected for this analysis are the highest rated Sci-Fi or Fantasy movie from the 3 most popular streaming services (popularity determined by subscriber count). For each movie, the top 25 reviews displayed sorted based on IMDBs algorithm were collected as to be minimally invasive of IMDBs website during my scraping. For each review, the date of the review, reviewers name, movie name that they reviewed, and all of the text actually within their review was collected. The 3 services are: Amazon Prime Video, Disney+, and Netflix. To compare these against less popular movies, I have also taken the lowest rated Sci-Fi and Fantasy movie from each of these 3 services. The results are as follows:

Netflix:

Highest Rated = The Lord of the Rings: Return of the King (9/10 Stars)

Lowest Rated = Aerials (1.5/10 Stars)

Amazon Prime Video:

Highest Rated = Rocketry: The Nambi Effect (8.8/10 Stars)

Lowest Rated = Finding Jesus (1.1/10 Stars)

Disney+:

Highest Rated = Star Wars: The Empire Strikes Back (8.7/10 Stars)

Lowest Rated = Kazaam (3.1/10 Stars)

Data Analysis

To test my hypothesis that higher reviewed movies will correlate with longer reviews, I wanted to create a simple bar chart that shows the average number of words per review for each of these movies.

There are 2 movies within these 6 that have garnered massive fanbases for the films, and for those movies the average number of reviews blows the other movies out of the water with the average word count per review. Interestingly, Kazaam has a few hundred words per review higher than the third high rated film. This might be due to the fact that Shaquille O’Neil is the star actor for the Kazaam movie, who could be a polarizing movie star for some causing them to talk more at length about the film. Rocketry: The Nambi Effect is also a movie released in 2022, so there might not be an established fanbase for this movie just yet.

Since The Lord of the Rings and Star Wars are (unsurprisingly) the two outliers in terms of review length, I wanted to go a little bit deeper into reviews for these two movies and see when these two movies were getting the most in depth reviews. To do this, I am looking into the average length of reviews over time to see how these two historic franchises have held up over time.

From this data, it looks like there was a significantly higher spike for the Return of the King movie around the 2003 and 2005 mark. This makes absolute sense, since the movie came out in 2003 so I can make an educated guess and say that the people who were avid fans and had more to say wanted to get their voice heard as soon when they first saw the movie. The discussion for this movie has had a significant drop off since that 2005 marker. In comparison, the Star Wars movie has a pretty steady amount of discussion surrounding the film throughout the years. This could very well be due to the franchise being much more pervasive than a trilogy, where the most recent movie coming out in 2019. There is also the confounding factor of this data only coming from the first 25 reviews that IMDBs algorithm shows at the top of the movie review page.

Sentiment Scoring

To analyze the actual content of these reviews a little bit more, I wanted to conduct some sentiment scoring that would investigate the emotion contained within these reviews. To do this, I will be using the NRC lexicon. For those unfamiliar, the NRC lexicon is a list of English words, and for each of these words there is a correlated human emotion or emotions (anger, fear, anticipation, trust, surprise, sadness, joy, disgust) and a correlated sentiment(positive, negative). To start of this analysis, I just wanted to look at the most reoccurring words out of all these reviews.

word n
luke 89
story 81
empire 68
time 64
characters 56
vader 56
trilogy 55
acting 52
scenes 52
battle 50

Star Wars is absolutely dominating this section. A lot of the Star Wars reviewers needed to touch on Luke, Vader, and the Empire which isn’t surprising considering that The Empire Strikes back is the movie with Luke and Vader’s fight scene. We do see some more generic words in this section, although the lower starred movies don’t seem to have enough consistency across the reviews to make it on this list.

Next, I wanted to investigate what sentiments and emotions occur most frequently within all of these reviews using the NRC lexicon.

sentiment Count Percent of scoreable words
positive 1732 22.31
negative 994 12.80
trust 928 11.95
anticipation 874 11.26
joy 831 10.70
fear 625 8.05
anger 551 7.10
sadness 476 6.13
disgust 388 5.00
surprise 366 4.71

It is refreshing to see that there are significantly more positive words used in these reviews than negative. However, as we learned earlier in this document, people have much more to say for a good movie than a bad one. Words associated with trust and anticipation occur the most frequently, which makes sense as effective movies will hold the viewer in anticipation, and trust is a major theme of The Empire Strikes Back. This analysis is more meaningful when seen split up by movie, so I wanted to dive a little bit deeper into the emotions in reviews for each of these movies individually.

In order to do this, I used the same data visualized in a bar chart and split up for each individual movie.

For each of the 3 highly rated movies, the amount of positive words vastly outweighs the negative words, which is to be expected. For Finding Jesus and Aerials, we can see there are more negative words used in these reviews than there are positive, which again makes sense that people are typically using positive words for movies they enjoyed versus movies that they didn’t. Surprisingly, Kazaam has more positive words used, so maybe this movie was enjoyed slightly more enjoyed than the score suggests. Aerials has fear as the most heavily correlated emotion, which makes sense in context of this being an alien invasion thriller and I find that fascinating we can see a reflection of the genre in the reviews. Finding Jesus anticipation and anger as the two words most correlated with the movie, which seems to be the least enjoyed movie out of the 3 of these. ```