This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
library(tidyverse)
theme_set(theme_light())
horror_movies <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-22/horror_movies.csv")
The data being analyzed consists of 3,328 horror movies. These movies all contain factors such as genre which includes Horror/Comedy, Horror/Drama, etc. Other factors include cast, rating, Director and r elease date.
Hint: One graph of your choice.
horror_movies <- horror_movies %>%
arrange(desc(review_rating)) %>%
extract(title, "year","\\((d\\d\\d\\d)\\)$", remove = FALSE, convert = TRUE) %>%
mutate(budget = parse_number(budget))
Most of the movies are since 2012.
horror_movies %>%
count(language, sort = TRUE)
## # A tibble: 188 x 2
## language n
## <chr> <int>
## 1 English 2421
## 2 Spanish 96
## 3 Japanese 77
## 4 <NA> 71
## 5 Hindi 37
## 6 Filipino|Tagalog 34
## 7 Thai 34
## 8 English|Spanish 30
## 9 Turkish 30
## 10 Tamil 29
## # … with 178 more rows
horror_movies %>%
count(genres, sort = TRUE)
## # A tibble: 262 x 2
## genres n
## <chr> <int>
## 1 Horror 1059
## 2 Horror| Thriller 474
## 3 Comedy| Horror 245
## 4 Horror| Mystery| Thriller 172
## 5 Drama| Horror| Thriller 161
## 6 Drama| Horror 72
## 7 Horror| Sci-Fi| Thriller 66
## 8 Drama| Horror| Mystery| Thriller 53
## 9 Action| Horror| Thriller 48
## 10 Horror| Mystery 47
## # … with 252 more rows
horror_movies %>%
ggplot(aes(budget)) +
geom_histogram() +
scale_x_log10(labels = scales::dollar)
The story behind the graph is to depict the movie budgets and display the outcomes between cost over which movies did better and how much was spent to shoot them. I thought this was interesting because most higher movie budgets did better but still some lower end budgets had a decent turn out.