this project explores what factors might influence how well a movie is rated. we use a dataset of nearly 10,000 movies to analyze the relationship between genres, popularity, vote counts, language, and average rating.
2025-06-15
this project explores what factors might influence how well a movie is rated. we use a dataset of nearly 10,000 movies to analyze the relationship between genres, popularity, vote counts, language, and average rating.
data were loaded from a csv file containing information about movie titles, release dates, vote averages, vote counts, popularity, language, genres, and more. our main variable of interest is vote_average, which ranges from 0 to 10.
we perform exploratory analysis using summary statistics and visualizations. genres are split for analysis, and plots are used to inspect trends in ratings.
## Selecting by n
## `geom_smooth()` using formula = 'y ~ x'
movies with higher popularity scores tend to get slightly higher ratings, but the correlation is weak. certain genres like documentaries and music films tend to get better average ratings. language also plays a role, with non-english films often scoring high when enough votes are available.
further analysis could include modeling interactions or predicting rating categories using classification techniques.