Top Films Study

The objective of this analysis is to investigate how different film genres are represented among highly rated films. The goal is to identify patterns and trends to help inform content recommendations, production decisions, and marketing strategies for stakeholders in the entertainment industry.

The data used in this analysis was sourced from a publicly available dataset containing information on the top 5000 movies listed on IMDb, accessible through the following link: https://www.kaggle.com/datasets/tiagoadrianunes/imdb-top-5000-movies/data.

Act 1: Votes Per Genre

The dataset to the left displays the genres organized from the most voted on to the least voted on. This means out of these 5000 films so many users on IMDb have interacted with the content. By breaking this down by genre we can compare the user-activity seeing which genre’s general audience brings the most interactivity to the site.

In order to calculate this I broke up a Genres column to show only one primary genre output. From there I used this calculation in the creation of the new table:

Votes<- data %>%
  group_by(Genre1) %>%
  summarize(totalvotes=sum(NumVotes, na.rm = TRUE))

From the table, we conclude:

  • Action is the most voted-on genre
  • Musicals are not very popular with the interface’s audience.

Proving that action fans are more likely to take- well, action. It is important to look at this website’s user interactivity as it gives insight to further utilize the digital socialization of action fans.

Act 2: The Genre Breakdown

It is hard to find one way to describe media. Hence, multiple of the films within the list were comprised of multiple genres.

On the chart above you will find the sum far exceeds the 5000 film list. This is because it takes into account all of the subgenres as values of their own.

So now, instead of the information showing up as:

  • “The Dark Knight” : Action

The data now displays:

  • “The Dark Knight” : Action
  • “The Dark Knight” : Crime
  • “The Dark Knight” : Drama

Breaking down the genre I used the code:

  Genres <- data %>%
    pivot_longer(cols = c(Genre1, Genre2, Genre3), 
                 names_to = "GenreType", 
                 values_to = "Genre") %>%
    filter(!is.na(Genre) & Genre != "") %>%   
    count(Genre, sort = TRUE) 

In which, Genre1 represents the primary genre (as displayed in the table above), and Genre2 and Genre3 represent the subgenres.

From this graph one thing is certain:

  • Most films belonging to IMDb’s top 5000 movies list are in the Drama genre.

Act 3: The Finale

In summary, from our analysis we found the top genre for interactivity was Action, however, most films contained Drama as one of their genre options. This gives insight that the most marketable film, as of the current date, would fall into either Action and/or Drama. If the goal of the project is for more viewers to interact with the film the stakeholders should invest in an Action film, but if the goal is to make a movie into the top 5000 the stakeholders may want to consider a Drama.