Loading Libraries & Dataset From dslabs

library(dslabs)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
data("movielens")
head(movielens)
##   movieId                                   title year
## 1      31                         Dangerous Minds 1995
## 2    1029                                   Dumbo 1941
## 3    1061                                Sleepers 1996
## 4    1129                    Escape from New York 1981
## 5    1172 Cinema Paradiso (Nuovo cinema Paradiso) 1989
## 6    1263                        Deer Hunter, The 1978
##                             genres userId rating  timestamp
## 1                            Drama      1    2.5 1260759144
## 2 Animation|Children|Drama|Musical      1    3.0 1260759179
## 3                         Thriller      1    3.0 1260759182
## 4 Action|Adventure|Sci-Fi|Thriller      1    2.0 1260759185
## 5                            Drama      1    4.0 1260759205
## 6                        Drama|War      1    2.0 1260759151
str(movielens)
## 'data.frame':    100004 obs. of  7 variables:
##  $ movieId  : int  31 1029 1061 1129 1172 1263 1287 1293 1339 1343 ...
##  $ title    : chr  "Dangerous Minds" "Dumbo" "Sleepers" "Escape from New York" ...
##  $ year     : int  1995 1941 1996 1981 1989 1978 1959 1982 1992 1991 ...
##  $ genres   : Factor w/ 901 levels "(no genres listed)",..: 762 510 899 120 762 836 81 762 844 899 ...
##  $ userId   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ rating   : num  2.5 3 3 2 4 2 2 2 3.5 2 ...
##  $ timestamp: int  1260759144 1260759179 1260759182 1260759185 1260759205 1260759151 1260759187 1260759148 1260759125 1260759131 ...

Histogram of How Many Ratings There Were Each Year

ggplot(movielens, aes(x = year, fill = rating)) + 
  geom_histogram(color = "red", fill = "black") +
  labs(title = "Histogram of Movie Ratings and Year", x = "Year", y = "# of Ratings", caption = "Source: movielens (DSLABS)")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## Warning: Removed 7 rows containing non-finite outside the scale range
## (`stat_bin()`).

Conclusion

From the dslabs library, I found the dataset titled “movielens” and thought it would be interesting to compare the variables year and ratings from it. This dataset provides descriptions of each movie with their rating as well. At first, I was going to create a scatter plot of these two variables but it did not look pleasing in my eyes. I then made this histogram from those variables and it shows the relationship of the year the movie was released and a rating between 0-5 of the movie. We can see that this graph is left-skewed showing that movies began getting more ratings as time progressed. This graph proves that in modern times, more people rated movies compared to the older generations of movies which could be caused by accessibility to rating opportunities.