An Analysis of IMDB’s Top 250 Movies

Author

Mallory Bowling

Line of Inquiry

What makes a movie “great” according to IMDB users?

This project explores the characteristics provided on IMDB’s Top 250 Movies webpage to identify patterns in run time, release year, and user ratings. I’m interested in whether longer movies receive higher scores, whether older films maintain popularity, and how user ratings are distributed across time. Despite not watching most of the movies on the list, I’m curious to see if there is a consistent formula for success.

How the Inquiry Will Be Answered

To answer this question, I scraped data from IMDB’s Top 250 movies list using the chromote, rvest, and httr packages in R. The dataset includes the movie title, year of release, run time in minutes, and average IMDB user rating. I chose this dataset because it is curated by user votes and represents widely accepted “great” movies.

I initially attempted to use the polite method of bowing and nodding, but I could only scrap the first 25 movies. So, I switched to Chromote scraping. I converted messy text into numeric data, isolated the year from blocks of metadata, and broke down run times from strings into numeric minute values.

Data Wrangling

After scraping, the data required several transformation steps: Titles were stripped of numerical ranking using regex. Release years were extracted from multi-value strings using str_sub. Run times were parsed from strings like “2h 30m”, accounting for edge cases like “45m” or “1h” using str_match. Star ratings were parsed and converted to numeric. All variables were assembled into a tidy tibble with appropriate types. Each step ensures a uniform structure that supports filtering, plotting, and statistical analysis.

Analysis and Interpretation

Histogram of Movie Ratings

Most Top 250 movies have IMDB scores between 8.0 and 9.0, with very few outliers above 9.2

Scatterplot of Runtime vs. Rating

There is a slight positive correlation between movie length and rating, but it’s weak, suggesting runtime alone doesn’t explain quality.

Bar Chart of Top 10 Longest Movies

This chart highlights the obvious outlier, Gangs of Wasseypur, which is the only movie over approximately 230 minutes long or 5 hours and 21 minutes. The movie skews the average of run time in minutes.

Bar Chart of the Top 10 Highest Rated Movies

The bar chart shows that the highest user rated movies are not necessarily in the order of its official IMDB rank in the 250. The Lord of the Rings: The Return of the King is the third highest user rated movie but is sixth on the list. While they are close, this shows that IMDB uses more metrics than solely the users’ average rating to determine its top 250 movies.

Boxplot of Rating by Decade

Long movies have a higher median user rating compared the medium- and short-length movies. Short and medium have nearly identical median user ratings, but short-length have a higher upper quartile compared to medium-length. Overall, this depicts that if a movie is longer than 120 minutes, it is more likely to have a higher IMDB user rating.

Conclusion

By analyzing user ratings, release years, and runtimes, several meaningful trends emerged.

First, the distribution of ratings showed that nearly all Top 250 films fall within a narrow 8.0 to 9.0 rating range, reinforcing that the list represents a highly selective group. Outliers above 9.2 are rare and should be viewed as exceptional rather than representative.

Second, the relationship between movie length and user rating revealed a slight but consistent trend: longer movies are more likely to receive higher ratings. This trend was confirmed by both a scatterplot with a positive slope and a boxplot showing that long movies (>120 minutes) have a higher median rating than short or medium-length films. While the correlation is weak, it suggests that longer runtimes may offer more space for character development and storytelling, which users tend to reward.

Third, the top 10 longest movies highlighted an extreme outlier—Gangs of Wasseypur, which is more than five hours long and significantly skews the run time average. Yet despite its length, it still maintains a spot in the Top 250, reinforcing the idea that run time alone is not a disqualifier when perceived quality is high.

Finally, an analysis of the Top 10 highest-rated movies showed that IMDB’s official rankings are not determined solely by user ratings. For example, The Lord of the Rings: The Return of the King has the third-highest user score but ranks sixth overall. This discrepancy suggests that other factors—such as vote volume, recency bias, or internal weighting—play a role in how IMDB ranks its Top 250.

Overall, while there is no single formula for what makes a movie “great,” the data suggests that user preferences tend to favor longer films and that extremely high ratings are reserved for a small elite.