Movie Review Comparison
Introduction to Movie Reviews Comparison
I am a big movie fan and Letterboxd is an app/website that I was introduced to a while back that is a place where anyone can rate and review their favorite movies. If you take a quick look at some of the reviews, you will see the types of reviews that people leave are all over the place. You have the wannabe movie critics who break down every detail of the movie, the person who quotes a random line from the movie, and the overly out-of-pocket person who writes something barely related to the movie. This is why I call it the Twitter for movies. It is also rare to see movies with an average rating over 4 stars, so that is how you know a movie is good on this platform. Although, it is a fun little space to track the movies you’ve watched and rate/review them, create watchlists for what you want to see, and read random reviews that earn a bunch of likes from other users.
Here are some example reviews under the Wolf of Wallstreet page that perfectly describe the variations in users on the platform:
Now you see what I mean. So, since it is a fun and easy to use platform I thought it would be interesting to do a little bit of scraping these reviews similar to how people used to scrape Twitter pre-Elon.
Personally, I was a big fan of the MCU as I grew up and this led me to want to compare users’ reviews for Iron Man (2008)and Captain America: The First Avenger (2011).
Collecting the Data
As I mentioned, the Letterboxd app and website are very straightforward and user-friendly. This was mostly similar when I wanted to scrape different pieces of data from the website. The default page for reviews shows the reviews with the most likes by other users, and I decided to stick with this when scraping. Another method could be sorting by earliest reviews, but since everyone sees the most popular reviews first and could be influenced on a movie based on these reviews, I decided to do my analyses with the default: popular reviews.
Data Types
Reviewer Name: Account name of the user writing the review.
Review Date: Standard date when review was posted.
Rating: From 1/2 to 5 star review that the user gave the movie, on increments of 1/2 a star.
Review Content: The text the user wrote to review the movie.
Movie Name: Name of the movie to track and compare different movies.
Movie Year: Year of the movie to track and compare different movie releases.
Page ID: The URL based on which page of reviews was scraped.
Review ID: Unique number associated with a review.
Link to original scraped dataset:
Sentiment Analyses
1. How does the positivity score change by year of the review?
For this question, I wanted to see if people were more positive towards one of the movies as time went on. Before I dove into it, though, I wanted to see how many words there were per year to begin with since the reviews I pulled were sorted by popularity and not date. Many of the more popular reviews are more recent which could be because more people are getting the app, rewatching the movie, or some other factor.
This is my look at words from reviews by year:
I can see that, as I mentioned, many of the reviews are more recent. Avengers: Infinity War came out in 2018 and Avengers: Endgame came out in 2019. The incredible success of both of those movies may be an influential factor that got people to rewatch the older movies in the Marvel timeline and cause them to review it on Letterboxd. Just a theory though.
Now that we know the data set is skewed toward more recent, we can analyze the question at hand. I will split each review out by words that can be scored by positive and negative and then compare the two movies scores by year. I believe there is still value to be gained from this analysis because of how Endgame turned out (without spoiling anything… though its been 5 years so everyone should know by now but I’ll still be nice) and the recent underwhelming releases from Marvel post-Avengers series.
I would conclude by saying as time went on there were better, more positive, reviews for Iron Man but Captain America did not follow the same trend. Captain America had a peak in 2019-21 but was low in scores before and after that.
2. What emotions do reviewers write the most about for these Marvel movies
To answer this question, I used the NRC emotive lexicon to link words with specific emotions. This allows me to graph these emotions and see the frequency of words per movie.
I see that both movies have very high counts of positive words with Iron Man being over 1,500 and Captain America being just under 1,500. The next most common emotions are anticipation, negative, trust, and joy. This makes sense because both movies have good ratings on Letterboxd (3.7 Iron Man, 3.3 Capt America). There are people who love these classic Marvel movies and there are people who will take hating on Marvel to their grave. This is why negative emotions make sense, and because people who do not like something about the movie, will elaborately describe the things wrong with it and use all kinds of negative language. Anticipation makes perfect sense as both of these movies are the first in their respective series, so people look forward to the next one and seeing the plot develop.
In terms of comparing the two movies, there is not much that is very different except trust where Iron Man scores much higher. But overall both movies tend to have the same emotions portrayed in reviews just as often.
3. What is the relationship between rating and review emotions?
This question now introduced the rating that a reviewer attached to their review. I want to look at the number of words that fall under certain emotions, just like question 2, but now have it compare them by rating for each movie.
Users are not required to put a rating with a review, and are not required to put a review with a rating. However, in this project, I scraped reviews, so every user have a review but not necessarily a rating. That is why there will be a bar for NA ratings and I kept it since the words in the review are still connected to emotions and I can see what type of emotion a person who did not leave a rating has in their review.
This visual may require some zooming, but immediately the color that stands out the most is purple and bright blue because they appear to be the most common for many of the emotions. Violet represents 4 stars and bright blue is 3.5 stars. Both positive emotion words are the most common for both movies, but I see that there are more 4-5 star ratings of positive words for Iron Man than there are Captain America.
Overall, I see more teal (3 stars) on the Captain America side across most emotions than Iron Man which means that people rated the movie lower regardless of the emotion they put into their review. However, I would conclude that since Iron Man has more positive, trusting, and anticipation words, it also has more of those purple to pink colors (4.5 to 5 star reviews) across the board.
Conclusion
After taking a look at a few different senitment analyses, I would say that Iron Man is more favored amongst Letterboxd reviewers than Captain America: The First Avenger. Which does make sense given that it has higher average rating score. Though, this data helped me see that Iron Man had higher positivity scores (particularly as of lately), more emotions in reviews with many being “good” emotions, and those “good” emotions were related to the rating the user gave the movie.
I am happy Iron Man performed better because that first Iron Man movie is one of my favorite, if not my favorite, Marvel Movie to this day.