This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.

Import data

library(tidyverse)
  theme_set(theme_light())

horror_movies <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-22/horror_movies.csv")

Description of the data and definition of variables

The data being analyzed consists of 3,328 horror movies. These movies all contain factors such as genre which includes Horror/Comedy, Horror/Drama, etc. Other factors include cast, rating, Director and r elease date.

Visualize data

Hint: One graph of your choice.

horror_movies <- horror_movies %>%
  arrange(desc(review_rating)) %>%
  extract(title, "year","\\((d\\d\\d\\d)\\)$", remove = FALSE, convert = TRUE) %>%
  mutate(budget = parse_number(budget))

Most of the movies are since 2012.

horror_movies %>%
  count(language, sort = TRUE)
## # A tibble: 188 x 2
##    language             n
##    <chr>            <int>
##  1 English           2421
##  2 Spanish             96
##  3 Japanese            77
##  4 <NA>                71
##  5 Hindi               37
##  6 Filipino|Tagalog    34
##  7 Thai                34
##  8 English|Spanish     30
##  9 Turkish             30
## 10 Tamil               29
## # … with 178 more rows

horror_movies %>%
  count(genres, sort = TRUE)
## # A tibble: 262 x 2
##    genres                               n
##    <chr>                            <int>
##  1 Horror                            1059
##  2 Horror| Thriller                   474
##  3 Comedy| Horror                     245
##  4 Horror| Mystery| Thriller          172
##  5 Drama| Horror| Thriller            161
##  6 Drama| Horror                       72
##  7 Horror| Sci-Fi| Thriller            66
##  8 Drama| Horror| Mystery| Thriller    53
##  9 Action| Horror| Thriller            48
## 10 Horror| Mystery                     47
## # … with 252 more rows

horror_movies %>%
  ggplot(aes(budget)) +
  geom_histogram() +
  scale_x_log10(labels = scales::dollar)

What is the story behind the graph?

The story behind the graph is to depict the movie budgets and display the outcomes between cost over which movies did better and how much was spent to shoot them. I thought this was interesting because most higher movie budgets did better but still some lower end budgets had a decent turn out.

Hide the messages, but display the code and its results on the webpage.

Write your name for the author at the top.

Use the correct slug.