I creates a new variable that divides movies into two groups: low and high, based on their average ratings.
Bubble plot of Popular Movies vs Average Ratings
ggplot(data_cleaning, aes(x = num_ratings, y = avg_rating,color = avg_rating, size = num_ratings) ) +geom_point(alpha =0.6) +scale_color_gradient(low ="blue", high ="red") +labs(title ="Popular Movies vs Average Ratings", x ="Number of Ratings",y ="Average Rating",color ="Average Ratings") +theme_minimal(base_family ="ArialMT")
I used the MovieLens dataset to study the relationship between the number of ratings and the average rating of movies. I created a bubble plot using ggplot, where the x-axis shows the number of ratings and the y-axis shows the average rating. The size of each bubble represents how many ratings a movie has, and the color shows the average rating, with blue for lower ratings and red for higher ratings. I also used the extrafont library to improve the font and make the graph look better.
In addition, I created an interactive scatterplot using Highcharter. In this plot, movies are divided into low rating and high rating groups with different colors. When you hover over a point, you can see more information like the movie title, genre, number of ratings, and average rating.
At first, I tried to create a bubble plot in Highcharter to match my main plot, but it looked too messy and harder to read, so I decided to use a scatterplot instead. I also tried to combine all the points together because I noticed that low ratings and high ratings were separated. I am not sure if what I did is correct , but I found a way to display them together like in the visualization.