Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
# I chose Movielens for this assignment, dslabs for the dataset; tidyverse for pipe-based wrangling and ggplot2library(dslabs)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# --- Data Preparation ---# To analyze user behavior by calculating:# 1. Total movies rated (Activity level)# 2. Average rating given (Leniency)# 3. Number of unique genres rated (Taste variety)user_profiles <- movielens %>%group_by(userId) %>%summarize(total_rated =n(),avg_user_rating =mean(rating),genre_diversity =n_distinct(genres),.groups ="drop" ) %>%# I used Filter for users with at least 20 ratings to avoid outliersfilter(total_rated >=20)# Multivariable Scatterplot # X: Total Movies Rated (Continuous - Activity)# Y: Average Rating Given (Continuous - Leniency)# Color: Genre Diversity (Third Variable - Continuous)ggplot(user_profiles, aes(x = total_rated, y = avg_user_rating)) +# geom_point with transparency to handle overplotting of thousands of usersgeom_point(aes(color = genre_diversity), alpha =0.5, size =1.5) +# Adding a non-default "Cividis" color scale (colorblind friendly and vibrant)scale_color_viridis_c(option ="cividis", name ="Unique Genres") +# Customizing labels and data source caption [DS Labs]labs(title ="The 'Power User' Profile: Activity vs. Leniency",subtitle ="Relationship between total ratings provided and average score given by 6,000+ users",x ="Total Movies Rated (User Activity)",y ="Average Rating Given (User Leniency)",caption ="Data Source: DS Labs" ) +# Applying a non-default 'Classic' theme with custom modificationstheme_classic() +theme(plot.title =element_text(face ="bold", size =14, color ="#2c3e50"),axis.title =element_text(face ="italic"),panel.grid.major.y =element_line(color ="gray90"),legend.position ="bottom" )
For this visualization, I utilized the “movielens” data-set from the dslabs package, which contains thousands of movie ratings and metadata. To create the graph, I first used dplyr to aggregate the data by userId, which was used to calculate the total number of ratings per person (activity level), their average rating score (leniency), and the count of unique genres they had reviewed (diversity). I then used ggplot2 to construct a multivariable scatterplot, mapping the activity level to the x-axis and average rating to the y-axis. To incorporate a third continuous variable, I applied a color gradient based on genre diversity using the non-default, as requested by assignment instruction, cividis palette. Finally, I customized the appearance with theme_classic 9which i had to research to find how to use) and added descriptive labels and a data source caption to ensure the findings were clear, professional and consise.