2025-06-15

introduction

this project explores what factors might influence how well a movie is rated. we use a dataset of nearly 10,000 movies to analyze the relationship between genres, popularity, vote counts, language, and average rating.

methods

data were loaded from a csv file containing information about movie titles, release dates, vote averages, vote counts, popularity, language, genres, and more. our main variable of interest is vote_average, which ranges from 0 to 10.

we perform exploratory analysis using summary statistics and visualizations. genres are split for analysis, and plots are used to inspect trends in ratings.

load and clean data

most common genres

## Selecting by n

average rating by genre

popularity vs rating

## `geom_smooth()` using formula = 'y ~ x'

average rating by language

conclusion

movies with higher popularity scores tend to get slightly higher ratings, but the correlation is weak. certain genres like documentaries and music films tend to get better average ratings. language also plays a role, with non-english films often scoring high when enough votes are available.

further analysis could include modeling interactions or predicting rating categories using classification techniques.