library(tidyverse)
library(RColorBrewer)
library(lubridate)
library("dslabs")DS LAB
Loading Libraries
Loading Datasets
data(movielens, package = "dslabs")
my_movielens <- movielensModifying dates
my_movielens1 <- my_movielens |>
mutate(date = as_datetime(timestamp))
head(my_movielens1) movieId title year
1 31 Dangerous Minds 1995
2 1029 Dumbo 1941
3 1061 Sleepers 1996
4 1129 Escape from New York 1981
5 1172 Cinema Paradiso (Nuovo cinema Paradiso) 1989
6 1263 Deer Hunter, The 1978
genres userId rating timestamp date
1 Drama 1 2.5 1260759144 2009-12-14 02:52:24
2 Animation|Children|Drama|Musical 1 3.0 1260759179 2009-12-14 02:52:59
3 Thriller 1 3.0 1260759182 2009-12-14 02:53:02
4 Action|Adventure|Sci-Fi|Thriller 1 2.0 1260759185 2009-12-14 02:53:05
5 Drama 1 4.0 1260759205 2009-12-14 02:53:25
6 Drama|War 1 2.0 1260759151 2009-12-14 02:52:31
#mutate(date = as_datetime(timestamp))Creating a List
genres_list <- c("Action", "Adventure", "Animation", "Children",
"Comedy", "Crime", "Documentary", "Drama",
"Fantasy", "Film-Noir", "Horror", "IMAX",
"Musical", "Mystery", "Romance", "Sci-Fi",
"Thriller", "War", "Western")Inclusion/Exclusion
movie_ratings<- my_movielens1 |>
separate_rows(genres,sep = "\\|") |>
filter(genres %in% genres_list) |>
group_by(genres,year) |>
summarise(avg_rating = round(mean(rating, na.rm = TRUE), 3),
count = n()
)|>
arrange(desc(avg_rating))`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by genres and year.
ℹ Output is grouped by genres.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(genres, year))` for per-operation grouping
(`?dplyr::dplyr_by`) instead.
head (movie_ratings) # I told you to fix this to "head"# A tibble: 6 × 4
# Groups: genres [6]
genres year avg_rating count
<chr> <int> <dbl> <int>
1 Action 1924 5 1
2 Adventure 1924 5 1
3 Horror 1939 5 1
4 Musical 1974 5 1
5 War 1948 5 1
6 IMAX 1997 4.75 2
Highcharts Visualisation
Your colors are for categorical variables, but your avg_rating is numerical
cols <- c(“#d73027”,“#f46d43”,“#fdae61”,“#ffffbf”,“#e0f3f8”,“#91bfdb”, “#abd9e9”, “#74add1”,“#2166ac”,“#313695”,“#1a9641”,“#74c476”,“#fd8d3c”,“#9e0142”,“#8856a7”,“#c51b7d”,“#f1b6da”,“#b8e186”,“#4d9221”)
library(highcharter)Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use
Attaching package: 'highcharter'
The following object is masked from 'package:dslabs':
stars
highchart() |>
hc_add_series(
data = movie_ratings,
type = "heatmap",
hcaes(x = year ,
y = genres,
value = avg_rating)) |>
hc_xAxis(title = list(text = "Year")) |>
hc_yAxis(title = list(text = "Genre")) |>
hc_legend(
align = "right",
layout = "vertical",
verticalAlign = "middle",
title = list(text = "Avg Rating")) |>
hc_colorAxis(
minColor = "#d73027", # Color for lowest value
maxColor = "#4374B3" # Color for highest value
)Essay
The dataset used for this analysis is the MovieLens dataset from the dslabs package.It contains over 100,000 movie ratings submitted by users of the MovieLens platform, spanning movies released from as early as the 1910s through the 2010s.The dataset includes variables such as userId, movieId, title, genres, rating, year, and timestamp. To create the heatmap above, which depicts the average rating per genres,i started by splitting genres into separate format, e.g Action|Drama|Thriller to Action, Thriller, Drama separately. Then I perform some inclusion| exclusion by adding average ratings and grouong by genres.he heatmap was built using the highcharter package, with year on the x-axis, genre index on the y-axis mapped back to genre name categories, and color uniquely assigned per genre using a custom color vector