I will use data from previous surveys data and apply a relatively simple, easily understandable user-based collaborative filtering method to generate recommendations based on the preferences of similar users.
2.What data challenges do you anticipate?
This dataset is relatively small, and some ratings are incomplete, so the model’s results may be somewhat limited.
Rows: 7 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): critics
dbl (6): Zootopia 2, Stranger Things 5, Captain America:
Brave New World, N...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert wide data to long formatrating_long <- movies_rating |>pivot_longer(cols =-critics,names_to ="movie",values_to ="rating",values_drop_na =TRUE)glimpse(rating_long)
4.Choose one personalized recommendation algorithm.
This project uses user-to-user collaborative filtering because the dataset contains only user-movie ratings, without movie features, and the dataset is small.
# 2. Calculate similarity between two criticssimilarity_score <-function(u1, u2, data) { common <-inner_join( data %>%filter(critics == u1) %>%select(movie, rating1 = rating), data %>%filter(critics == u2) %>%select(movie, rating2 = rating),by ="movie")if (nrow(common) ==0) return(NA_real_)1/ (1+mean(abs(common$rating1 - common$rating2)))}
# 4. Recommend top 2 unrated movies for each criticall_pairs <-expand_grid(critics =unique(rating_long$critics),movie =unique(rating_long$movie))recommendations <- all_pairs |>anti_join(rating_long, by =c("critics", "movie")) |>mutate(predicted_rating =map2_dbl( critics, movie,~predict_rating(.x, .y, rating_long) )) |>group_by(critics) |>arrange(desc(predicted_rating), .by_group =TRUE) |>slice_head(n =2)clean_recommendations <- recommendations %>%mutate(across(where(is.numeric), ~round(., 3)))clean_recommendations
# A tibble: 12 × 3
# Groups: critics [6]
critics movie predicted_rating
<chr> <chr> <dbl>
1 CO stranger_things_5 4.4
2 CO now_you_see_me_now_you_don_t 4
3 HP zootopia_2 4.57
4 HP the_sponge_bob_movie_search_for_square_pants 4
5 J zootopia_2 4.18
6 J now_you_see_me_now_you_don_t 3.33
7 W the_sponge_bob_movie_search_for_square_pants 3.60
8 W now_you_see_me_now_you_don_t 2.89
9 ZH stranger_things_5 4.57
10 ZH zootopia_2 4
11 ZY zootopia_2 4.56
12 ZY the_sponge_bob_movie_search_for_square_pants 4
5.Evaluate the performance of the recommender.
I used a hold-out evaluation to test the recommender on ratings that were originally known but temporarily removed. MAE and RMSE measure how close the predicted ratings are to the actual ratings.
# A tibble: 7 × 5
critics movie rating predicted_rating error
<chr> <chr> <dbl> <dbl> <dbl>
1 CO the_sponge_bob_movie_search_for_square… 3 4 -1
2 HP stranger_things_5 1 5 -4
3 J stranger_things_5 5 5 0
4 M disneys_snow_white 1 3.59 -2.59
5 W zootopia_2 5 4 1
6 ZH now_you_see_me_now_you_don_t 4 3.59 0.409
7 ZY now_you_see_me_now_you_don_t 2 4 -2
# MAE: Mean Absolute Error; the smaller, the better# RMSE: Root Mean Square Error; the smaller, the better; it is more sensitive to large errorseval_metrics <- eval_results |>summarise(MAE =mean(abs(error)),RMSE =sqrt(mean(error^2)) )eval_metrics
# A tibble: 1 × 2
MAE RMSE
<dbl> <dbl>
1 1.57 2.03
The result shows that the recommender has an MAE of about 1.57 and an RMSE of about 2.03. This means that, on average, the predicted ratings are about 1.57 points away from the actual ratings. The RMSE is higher because it is more sensitive to larger errors, such as HP’s rating for stranger_things_5, where the actual rating was 1 but the model predicted 5.
6. Conclusion
The recommender only outputs the top two unrated movies for each critic, ranked by predicted rating.
CO: –stranger_things_5 (4.400) and –now_you_see_me_now_you_don_t (4.000).
HP: –zootopia_2 (4.571) and –the_sponge_bob_movie_search_for_square_pants (4.000).
J: –zootopia_2 (4.182) and –now_you_see_me_now_you_don_t (3.333).
W: –the_sponge_bob_movie_search_for_square_pants (3.595) and –now_you_see_me_now_you_don_t (2.889).
ZH: –stranger_things_5 (4.571) and –zootopia_2 (4.000).
ZY: –zootopia_2 (4.556) and –the_sponge_bob_movie_search_for_square_pants (4.000).