library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)
library(stringr)
data(package="dslabs")
I will be using the dataset “movielens” and will determine the ratings of movie genres from 2000-2010. The genre’s include action, romance, comdedy, and drama.
data("movielens")
unique(movielens$year)
## [1] 1995 1941 1996 1981 1989 1978 1959 1982 1992 1991 1979 1971 1980 1988 1998
## [16] 1986 1974 1994 1993 1990 1970 1987 1983 1997 1999 1984 2000 2002 2003 2004
## [31] 2006 2008 2009 1977 1937 1940 1972 1958 1939 1950 1964 1951 1975 1960 1985
## [46] 1962 1976 1942 1967 1955 1961 1953 1928 1973 1965 2001 2005 1957 1954 1968
## [61] 1966 2007 2010 2011 2012 2013 1952 1963 1945 1946 1949 1948 1931 1969 1927
## [76] 1933 1956 1944 1936 1925 1929 1935 2014 2015 2016 1922 1947 1926 1920 1938
## [91] 1934 1930 1943 1921 1932 1924 NA 1915 1902 1923 1918 1917 1916 1919
movies <- movielens %>%
filter(year >= 2000, year <= 2010, genres %in% c("Comedy", "Action", "Thriller", "Drama")) %>%
group_by(genres, year, title) %>%
summarise(movie_ratings = sum(rating))
## `summarise()` has grouped output by 'genres', 'year'. You can override using
## the `.groups` argument.
movies
## # A tibble: 529 × 4
## # Groups: genres, year [33]
## genres year title movie_ratings
## <fct> <int> <chr> <dbl>
## 1 Action 2001 Kiss of the Dragon 16
## 2 Action 2001 Last Castle, The 14
## 3 Action 2004 Walking Tall 7
## 4 Action 2007 Crows Zero (Kurôzu zero) 7.5
## 5 Action 2008 Never Back Down 7.5
## 6 Action 2008 Ong-Bak 2: The Beginning (Ong Bak 2) 10
## 7 Action 2010 13 Assassins (Jûsan-nin no shikaku) 16.5
## 8 Action 2010 Ip Man 2 15
## 9 Comedy 2000 Bamboozled 12
## 10 Comedy 2000 Bedazzled 42
## # ℹ 519 more rows
However, it only picks movies with one genre name to it. For example, for action it only has 8 movies released in a decade. We know this to be false. So i need to write code that shows me all the action movies even if it has other genre names like adevnture
unique(movielens$rating)
## [1] 2.5 3.0 2.0 4.0 3.5 1.0 5.0 4.5 1.5 0.5