library(tidyverse)
library(readr)
IMDb_basic_messy <- read_tsv("~/Desktop/IMDb_dataset/title.basics.tsv", show_col_types = FALSE)
IMDb_basic_messy <- IMDb_basic_messy[,c(1:3,6,8:9)] %>%
filter(titleType == "movie") %>%
mutate(runtimeMinutes = str_remove(runtimeMinutes, "\\\\[:upper:]")) %>%
mutate(genres = str_remove(genres, "\\\\[:upper:]")) %>%
mutate(runtimeMinutes = na_if(runtimeMinutes, "")) %>%
mutate(genres = na_if(genres, ""))
IMDb_ratings_messy <- read_tsv("~/Desktop/IMDb_dataset/title.ratings.tsv", show_col_types = FALSE)
IMDb_basic_messy <- left_join(IMDb_basic_messy, IMDb_ratings_messy, by="tconst")
IMDb_crew_messy <- read_tsv("~/Desktop/IMDb_dataset/title.crew.tsv", show_col_types = FALSE)
IMDb_basic_messy <- left_join(IMDb_basic_messy, IMDb_crew_messy, by="tconst")
IMDb_basic_messy <- IMDb_basic_messy %>%
rename(nconst = directors)
IMDb_names_messy <- read_tsv("~/Desktop/IMDb_dataset/name.basics.tsv", show_col_types = FALSE)
IMDb_basic_messy <- left_join(IMDb_basic_messy, IMDb_names_messy, by="nconst")
IMDb_basic_messy <- drop_na(IMDb_basic_messy)
IMDb_films_massy <- IMDb_basic_messy[,c(3:8,11)] %>%
filter(str_detect(genres, "Horror")) %>%
arrange (numVotes, by_group = TRUE)
IMDb_horror <- IMDb_films_massy[19174:20173,]
IMDb_horror = separate(IMDb_horror,
genres,
sep = ",",
into = c("primaryGenre", "secondaryGenre", "tertiaryGenre"))
IMDb_horror <- IMDb_horror[-c(14,104,180,649),] %>%
mutate(startYear = as.numeric(startYear)) %>%
mutate(runtimeMinutes = as.numeric(runtimeMinutes))INFO6270_Lab_5(veronica_kerrigan)
IMDb Horror Movie Data
This section contains the code for a tibble called “IMDb_horror.” You, the view can see the code because I decided that you could. If I did not want you to see it I could have used #| echo: false.
The First Visualization
ggplot(IMDb_horror) +
aes(x=startYear, y=averageRating) +
geom_point(aes(colour=primaryGenre)) +
labs(x = "Release Year",
y = "IMDb Rating",
title = "Ratings of Popular Horror Movies on IMDb Over Time",
caption = "Source: https://datasets.imdbws.com/",
colour = "Subgenres") +
theme_light() Doesn’t it appear that the ratings for horror movies are declining over time? It certainly looks like that to me! I don’t think it’s because horror movies are getting worse, though.
This data is only the most popular horror movies on IMDb, remember. I just don’t think that people go out of their way to watch and then review very old and very bad horror movies.
But if I wanted to know this for sure, I would have to do more data analysis.
The Second Visualization
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
library(hrbrthemes)
library(viridis)
p <- IMDb_horror %>%
ggplot( aes(x=primaryGenre, y=averageRating, fill=primaryGenre)) +
geom_violin(width=1.0, size=0.1) +
xlab("Genre") +
theme(legend.position="none") +
xlab("") +
labs(y = "IMDb Rating",
title = "Distribution of Ratings by Subgenre",
subtitle = "Horror Movies Popular on IMDb",
caption = "Source: https://datasets.imdbws.com/",
fill = "Subgenres") +
theme_light()
pIMDb does not currently have a dedicated system for classifying horror movies into subgenres. Therefore all horror subgenres are classified as ‘Horror.’ This is annoying, but also not my fault.
Still even though it is not a perfect graph it is nice to look at!