dslabsAssignment

Author

Dekker Spielman

#this code opens the libraries
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dslabs)

#this code creates one row per movie, with an average score for each movie
movieless <- movielens |>
  arrange(year, movieId) |>
  summarize(.by = c("movieId"), rating = mean(rating), year = year) |>
  unique()

Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.

ggplot(data = movieless) +
  geom_bar(aes(year, fill = year)) +
  geom_point(aes(year, rating * 50, color = rating), alpha = .2) +
  geom_smooth(aes(year, rating * 50), color = "orange") +
  scale_y_continuous(sec.axis = sec_axis(trans = ~.* .02, name = "Rating (out of 5)")) +
  theme_dark() +
  labs(x = "Year",
       y = "Number of Movies",
       title = "Rating and Number of Movies by Year",
       color = "")

Warning: The `trans` argument of `sec_axis()` is deprecated as of ggplot2 3.5.0.
ℹ Please use the `transform` argument instead.

Warning: Removed 5 rows containing non-finite outside the scale range
(`stat_count()`).

Warning: The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Warning: Removed 5 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 5 rows containing missing values or values outside the scale range
(`geom_point()`).

The dataset that I ended up using for this assignment was the movielens dataset. I had a ton of different ideas when I was first trying to make this graph, but I had to settle for this because I wasn’t sure how to do what I wanted. I had been wanting to do something where different genres would be represented by different colors, but in this dataset, each movie can be given multiple genres within the same string, which makes sorting and filtering it more difficult. My final graph shows movie ratings, in addition to movie volume over time. It was a little difficult merging the data so that I could make both chart types within the same graph, but I think it ended up working well.