Loading in any library’s

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)
library(stringr)
data(package="dslabs")

Project Breakdown

I will be using the dataset “movielens” and will determine the ratings of movie genres from 2000-2010. The genre’s include action, romance, comdedy, and drama.

data("movielens")

Let’s take a look at the years

unique(movielens$year)
##   [1] 1995 1941 1996 1981 1989 1978 1959 1982 1992 1991 1979 1971 1980 1988 1998
##  [16] 1986 1974 1994 1993 1990 1970 1987 1983 1997 1999 1984 2000 2002 2003 2004
##  [31] 2006 2008 2009 1977 1937 1940 1972 1958 1939 1950 1964 1951 1975 1960 1985
##  [46] 1962 1976 1942 1967 1955 1961 1953 1928 1973 1965 2001 2005 1957 1954 1968
##  [61] 1966 2007 2010 2011 2012 2013 1952 1963 1945 1946 1949 1948 1931 1969 1927
##  [76] 1933 1956 1944 1936 1925 1929 1935 2014 2015 2016 1922 1947 1926 1920 1938
##  [91] 1934 1930 1943 1921 1932 1924   NA 1915 1902 1923 1918 1917 1916 1919

Ok now lets filter out movies from 2000-2010

movies <- movielens %>%
  filter(year >= 2000, year <= 2010, genres %in% c("Comedy", "Action", "Thriller", "Drama")) %>%
  group_by(genres, year, title) %>%
  summarise(movie_ratings = sum(rating)) 
## `summarise()` has grouped output by 'genres', 'year'. You can override using
## the `.groups` argument.
movies
## # A tibble: 529 × 4
## # Groups:   genres, year [33]
##    genres  year title                                movie_ratings
##    <fct>  <int> <chr>                                        <dbl>
##  1 Action  2001 Kiss of the Dragon                            16  
##  2 Action  2001 Last Castle, The                              14  
##  3 Action  2004 Walking Tall                                   7  
##  4 Action  2007 Crows Zero (Kurôzu zero)                       7.5
##  5 Action  2008 Never Back Down                                7.5
##  6 Action  2008 Ong-Bak 2: The Beginning (Ong Bak 2)          10  
##  7 Action  2010 13 Assassins (Jûsan-nin no shikaku)           16.5
##  8 Action  2010 Ip Man 2                                      15  
##  9 Comedy  2000 Bamboozled                                    12  
## 10 Comedy  2000 Bedazzled                                     42  
## # ℹ 519 more rows

The code above works

However, it only picks movies with one genre name to it. For example, for action it only has 8 movies released in a decade. We know this to be false. So i need to write code that shows me all the action movies even if it has other genre names like adevnture

unique(movielens$rating)
##  [1] 2.5 3.0 2.0 4.0 3.5 1.0 5.0 4.5 1.5 0.5