DS LAB

Author

Gamaliel Ngouafon

Loading Libraries

library(tidyverse)
library(RColorBrewer)
library(lubridate)
library("dslabs")

Loading Datasets

data(movielens, package = "dslabs")

my_movielens <- movielens

Modifying dates

my_movielens1 <- my_movielens |>
  mutate(date = as_datetime(timestamp))
head(my_movielens1)
  movieId                                   title year
1      31                         Dangerous Minds 1995
2    1029                                   Dumbo 1941
3    1061                                Sleepers 1996
4    1129                    Escape from New York 1981
5    1172 Cinema Paradiso (Nuovo cinema Paradiso) 1989
6    1263                        Deer Hunter, The 1978
                            genres userId rating  timestamp                date
1                            Drama      1    2.5 1260759144 2009-12-14 02:52:24
2 Animation|Children|Drama|Musical      1    3.0 1260759179 2009-12-14 02:52:59
3                         Thriller      1    3.0 1260759182 2009-12-14 02:53:02
4 Action|Adventure|Sci-Fi|Thriller      1    2.0 1260759185 2009-12-14 02:53:05
5                            Drama      1    4.0 1260759205 2009-12-14 02:53:25
6                        Drama|War      1    2.0 1260759151 2009-12-14 02:52:31
  #mutate(date = as_datetime(timestamp))

Creating a List

genres_list <- c("Action", "Adventure", "Animation", "Children",
                 "Comedy", "Crime", "Documentary", "Drama",
                 "Fantasy", "Film-Noir", "Horror", "IMAX",
                 "Musical", "Mystery", "Romance", "Sci-Fi",
                 "Thriller", "War", "Western")

Inclusion/Exclusion

movie_ratings<- my_movielens1 |>
  separate_rows(genres,sep = "\\|") |>
  filter(genres %in% genres_list) |>
  group_by(genres,year) |>
  summarise(avg_rating = round(mean(rating, na.rm = TRUE), 3),
            count = n()
            )|>
  arrange(desc(avg_rating))
`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by genres and year.
ℹ Output is grouped by genres.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(genres, year))` for per-operation grouping
  (`?dplyr::dplyr_by`) instead.
head (movie_ratings)  # I told you to fix this to "head"
# A tibble: 6 × 4
# Groups:   genres [6]
  genres     year avg_rating count
  <chr>     <int>      <dbl> <int>
1 Action     1924       5        1
2 Adventure  1924       5        1
3 Horror     1939       5        1
4 Musical    1974       5        1
5 War        1948       5        1
6 IMAX       1997       4.75     2

Highcharts Visualisation

Your colors are for categorical variables, but your avg_rating is numerical

cols <- c(“#d73027”,“#f46d43”,“#fdae61”,“#ffffbf”,“#e0f3f8”,“#91bfdb”, “#abd9e9”, “#74add1”,“#2166ac”,“#313695”,“#1a9641”,“#74c476”,“#fd8d3c”,“#9e0142”,“#8856a7”,“#c51b7d”,“#f1b6da”,“#b8e186”,“#4d9221”)

library(highcharter)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use

Attaching package: 'highcharter'
The following object is masked from 'package:dslabs':

    stars
highchart() |>
  hc_add_series(
    data = movie_ratings,
    type = "heatmap",
    hcaes(x    = year ,
          y    = genres,
          value = avg_rating)) |>
  hc_xAxis(title = list(text = "Year")) |>
  hc_yAxis(title = list(text = "Genre")) |>
  hc_legend(
    align         = "right",
    layout        = "vertical",
    verticalAlign = "middle",
    title         = list(text = "Avg Rating")) |>
  hc_colorAxis(
    minColor = "#d73027", # Color for lowest value
    maxColor = "#4374B3"  # Color for highest value
  )

Essay

The dataset used for this analysis is the MovieLens dataset from the dslabs package.It contains over 100,000 movie ratings submitted by users of the MovieLens platform, spanning movies released from as early as the 1910s through the 2010s.The dataset includes variables such as userId, movieId, title, genres, rating, year, and timestamp. To create the heatmap above, which depicts the average rating per genres,i started by splitting genres into separate format, e.g Action|Drama|Thriller to Action, Thriller, Drama separately. Then I perform some inclusion| exclusion by adding average ratings and grouong by genres.he heatmap was built using the highcharter package, with year on the x-axis, genre index on the y-axis mapped back to genre name categories, and color uniquely assigned per genre using a custom color vector