movieId title year
1 31 Dangerous Minds 1995
2 1029 Dumbo 1941
3 1061 Sleepers 1996
4 1129 Escape from New York 1981
5 1172 Cinema Paradiso (Nuovo cinema Paradiso) 1989
6 1263 Deer Hunter, The 1978
genres userId rating timestamp
1 Drama 1 2.5 1260759144
2 Animation|Children|Drama|Musical 1 3.0 1260759179
3 Thriller 1 3.0 1260759182
4 Action|Adventure|Sci-Fi|Thriller 1 2.0 1260759185
5 Drama 1 4.0 1260759205
6 Drama|War 1 2.0 1260759151
# Filter for the years 2007 to 2016 and create a genres_updated columnmovielens_filtered <- movielens |>filter(year >=2007& year <=2016) |>mutate(genres_updated =word(genres, 1, 1, sep ="\\|"))# Create a heatmap to explore average movie ratings over the years by updated genreheatmap_chart <- movielens_filtered |>group_by(year, genres_updated) |>summarize(mean_ratings =mean(rating))
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
# Define a color palette from Set1 with 5 distinct colorsmy_palette <-brewer.pal(5, "Set1")# Ensure that mean_ratings is treated as a continuous variableheatmap_chart$mean_ratings <-as.numeric(heatmap_chart$mean_ratings)heatmap <-ggplot(heatmap_chart, aes(x = year, y = genres_updated, fill = mean_ratings)) +geom_tile() +scale_fill_gradientn(colors = my_palette, name ="Mean Ratings") +# Use the Set1 palettelabs(x ="Release Year",y ="Movie Genre",title ="Average Genre Ratings for Movies Released from 2007 to 2016",caption ="Data source: MovieLens dataset" ) +theme_minimal()# Display the heatmapprint(heatmap)
In this analysis, I used the MovieLens dataset to explore and visualize the average genre ratings for movies released between 2007 and 2016. This choice allows us to explore the trends and changes in movie ratings within the last decade, which is often more relevant to current audience preferences. To create the heatmap, I first filtered the dataset to include only movies from 2007 to 2016. I also added a new column, “genres_updated,” which captures the primary genre of each movie.
Next, I grouped the filtered data by the release year and the updated genre. For each combination of year and genre, I calculated the mean movie rating, representing the average user rating for that genre in a particular year.
To enhance the visualization, I used a color palette from the RColorBrewer package and employed five distinct colors. The colors represent different levels of mean ratings. The heatmap shows the relationship between movie genres and their average ratings over the specified ten-year period. The x-axis represents the release year, the y-axis displays movie genres, and the color intensity within each cell corresponds to the mean rating for that genre in a given year.
This heatmap helps us understand how user ratings for different genres have evolved over the years and provides insights into which genres consistently received high or low ratings during the specified period. It’s a valuable tool for exploring trends and patterns in movie ratings for specific genres.
The gaps in the heatmap where there is no color could indicate that no movies of a particular genre were released in that specific year. This is especially true for niche or less common genres that may not have regular annual releases.
Alternatively, it could mean that movies of that genre were released, but there were no recorded ratings for them during that year. This could be due to various reasons, such as limited viewership, the absence of user reviews or ratings, or data collection issues.