This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
library(tidyverse)
theme_set(theme_light())
horror_movies <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-22/horror_movies.csv")
horror_movies
## # A tibble: 3,328 x 12
## title genres release_date release_country movie_rating review_rating
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 Gut … Drama… 26-Oct-12 USA <NA> 3.9
## 2 The … Horror 13-Jan-17 USA <NA> NA
## 3 Slee… Horror 21-Oct-17 Canada <NA> NA
## 4 Trea… Comed… 23-Apr-13 USA NOT RATED 3.7
## 5 Infi… Crime… 10-Apr-15 USA <NA> 5.8
## 6 In E… Horro… 2017 UK <NA> NA
## 7 Ghos… Drama… 3-Jun-14 USA NOT RATED 5.1
## 8 Para… Actio… 25-Apr-15 Japan <NA> 6.5
## 9 Stra… Horro… 28-May-17 Spain PG-13 4.6
## 10 Tuta… Comed… 7-Oct-16 India <NA> 5.4
## # … with 3,318 more rows, and 6 more variables: movie_run_time <chr>,
## # plot <chr>, cast <chr>, language <chr>, filming_locations <chr>,
## # budget <chr>
In the data is all the movies and information about the movies found. There are twelve variables and they are title, genres, release_date, release_country, movie_rating, review_rating, movie_run_time, plot, cast, language, filmng_locations, and budget. The variable are all self-explanatory. It’s just all information about the movies; what they’re called, when they came out, where they came out, they’re budget, who was in it, what it was about, any genres it had including subgenres, the rating it was given, and the rating telling how good people determined the movie was. The goal is to study the review rating and find another variable that had a connection with it.
Hint: One graph of your choice.
horror_movies %>%
ggplot(aes(budget, review_rating)) +
geom_point() +
geom_smooth(method = "lm")
horror_movies
## # A tibble: 3,328 x 12
## title genres release_date release_country movie_rating review_rating
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 Gut … Drama… 26-Oct-12 USA <NA> 3.9
## 2 The … Horror 13-Jan-17 USA <NA> NA
## 3 Slee… Horror 21-Oct-17 Canada <NA> NA
## 4 Trea… Comed… 23-Apr-13 USA NOT RATED 3.7
## 5 Infi… Crime… 10-Apr-15 USA <NA> 5.8
## 6 In E… Horro… 2017 UK <NA> NA
## 7 Ghos… Drama… 3-Jun-14 USA NOT RATED 5.1
## 8 Para… Actio… 25-Apr-15 Japan <NA> 6.5
## 9 Stra… Horro… 28-May-17 Spain PG-13 4.6
## 10 Tuta… Comed… 7-Oct-16 India <NA> 5.4
## # … with 3,318 more rows, and 6 more variables: movie_run_time <chr>,
## # plot <chr>, cast <chr>, language <chr>, filming_locations <chr>,
## # budget <chr>
This graph is showing the connection between the movie budget and the review rating. It clearly shows that there is no connection between these two variables. This is interesting to me because I would’ve thought that the movies with a higher budget would get a higher rating.