This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.

Import data

library(tidyverse)
theme_set(theme_light())
horror_movies <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-22/horror_movies.csv")
horror_movies
## # A tibble: 3,328 x 12
##    title genres release_date release_country movie_rating review_rating
##    <chr> <chr>  <chr>        <chr>           <chr>                <dbl>
##  1 Gut … Drama… 26-Oct-12    USA             <NA>                   3.9
##  2 The … Horror 13-Jan-17    USA             <NA>                  NA  
##  3 Slee… Horror 21-Oct-17    Canada          <NA>                  NA  
##  4 Trea… Comed… 23-Apr-13    USA             NOT RATED              3.7
##  5 Infi… Crime… 10-Apr-15    USA             <NA>                   5.8
##  6 In E… Horro… 2017         UK              <NA>                  NA  
##  7 Ghos… Drama… 3-Jun-14     USA             NOT RATED              5.1
##  8 Para… Actio… 25-Apr-15    Japan           <NA>                   6.5
##  9 Stra… Horro… 28-May-17    Spain           PG-13                  4.6
## 10 Tuta… Comed… 7-Oct-16     India           <NA>                   5.4
## # … with 3,318 more rows, and 6 more variables: movie_run_time <chr>,
## #   plot <chr>, cast <chr>, language <chr>, filming_locations <chr>,
## #   budget <chr>

Description of the data and definition of variables

In the data is all the movies and information about the movies found. There are twelve variables and they are title, genres, release_date, release_country, movie_rating, review_rating, movie_run_time, plot, cast, language, filmng_locations, and budget. The variable are all self-explanatory. It’s just all information about the movies; what they’re called, when they came out, where they came out, they’re budget, who was in it, what it was about, any genres it had including subgenres, the rating it was given, and the rating telling how good people determined the movie was. The goal is to study the review rating and find another variable that had a connection with it.

Visualize data

Hint: One graph of your choice.

horror_movies %>%
  ggplot(aes(budget, review_rating)) +
  geom_point() +
  geom_smooth(method = "lm") 

horror_movies
## # A tibble: 3,328 x 12
##    title genres release_date release_country movie_rating review_rating
##    <chr> <chr>  <chr>        <chr>           <chr>                <dbl>
##  1 Gut … Drama… 26-Oct-12    USA             <NA>                   3.9
##  2 The … Horror 13-Jan-17    USA             <NA>                  NA  
##  3 Slee… Horror 21-Oct-17    Canada          <NA>                  NA  
##  4 Trea… Comed… 23-Apr-13    USA             NOT RATED              3.7
##  5 Infi… Crime… 10-Apr-15    USA             <NA>                   5.8
##  6 In E… Horro… 2017         UK              <NA>                  NA  
##  7 Ghos… Drama… 3-Jun-14     USA             NOT RATED              5.1
##  8 Para… Actio… 25-Apr-15    Japan           <NA>                   6.5
##  9 Stra… Horro… 28-May-17    Spain           PG-13                  4.6
## 10 Tuta… Comed… 7-Oct-16     India           <NA>                   5.4
## # … with 3,318 more rows, and 6 more variables: movie_run_time <chr>,
## #   plot <chr>, cast <chr>, language <chr>, filming_locations <chr>,
## #   budget <chr>

What is the story behind the graph?

This graph is showing the connection between the movie budget and the review rating. It clearly shows that there is no connection between these two variables. This is interesting to me because I would’ve thought that the movies with a higher budget would get a higher rating.

Hide the messages, but display the code and its results on the webpage.

Write your name for the author at the top.

Use the correct slug.