INFO6270_Lab_5(veronica_kerrigan)

Author

Veronica Kerrigan

IMDb Horror Movie Data

This section contains the code for a tibble called “IMDb_horror.” You, the view can see the code because I decided that you could. If I did not want you to see it I could have used #| echo: false.

library(tidyverse)
library(readr)


IMDb_basic_messy <- read_tsv("~/Desktop/IMDb_dataset/title.basics.tsv", show_col_types = FALSE)


IMDb_basic_messy <- IMDb_basic_messy[,c(1:3,6,8:9)] %>%
  filter(titleType == "movie") %>%
  mutate(runtimeMinutes = str_remove(runtimeMinutes, "\\\\[:upper:]")) %>%
  mutate(genres = str_remove(genres, "\\\\[:upper:]")) %>%
  mutate(runtimeMinutes = na_if(runtimeMinutes, "")) %>%
  mutate(genres = na_if(genres, ""))

IMDb_ratings_messy <- read_tsv("~/Desktop/IMDb_dataset/title.ratings.tsv", show_col_types = FALSE)

IMDb_basic_messy <- left_join(IMDb_basic_messy, IMDb_ratings_messy, by="tconst")

IMDb_crew_messy <- read_tsv("~/Desktop/IMDb_dataset/title.crew.tsv", show_col_types = FALSE)

IMDb_basic_messy <- left_join(IMDb_basic_messy, IMDb_crew_messy, by="tconst")

IMDb_basic_messy <- IMDb_basic_messy %>%
  rename(nconst = directors)

IMDb_names_messy <- read_tsv("~/Desktop/IMDb_dataset/name.basics.tsv", show_col_types = FALSE)

IMDb_basic_messy <- left_join(IMDb_basic_messy, IMDb_names_messy, by="nconst")

IMDb_basic_messy <- drop_na(IMDb_basic_messy)

IMDb_films_massy <- IMDb_basic_messy[,c(3:8,11)] %>%
  filter(str_detect(genres, "Horror")) %>%
  arrange (numVotes, by_group = TRUE)
  
IMDb_horror <- IMDb_films_massy[19174:20173,] 

IMDb_horror = separate(IMDb_horror,
                     genres, 
                     sep = ",", 
                     into = c("primaryGenre", "secondaryGenre", "tertiaryGenre"))

IMDb_horror <- IMDb_horror[-c(14,104,180,649),] %>%
  mutate(startYear = as.numeric(startYear)) %>%
  mutate(runtimeMinutes = as.numeric(runtimeMinutes))

The First Visualization

ggplot(IMDb_horror) +
  aes(x=startYear, y=averageRating) + 
  geom_point(aes(colour=primaryGenre)) +
    labs(x = "Release Year",
       y = "IMDb Rating",
       title = "Ratings of Popular Horror Movies on IMDb Over Time",
       caption = "Source: https://datasets.imdbws.com/",
       colour = "Subgenres") +
  theme_light() 

Doesn’t it appear that the ratings for horror movies are declining over time? It certainly looks like that to me! I don’t think it’s because horror movies are getting worse, though.

This data is only the most popular horror movies on IMDb, remember. I just don’t think that people go out of their way to watch and then review very old and very bad horror movies.

But if I wanted to know this for sure, I would have to do more data analysis.

The Second Visualization

library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
library(hrbrthemes)
library(viridis)

p <- IMDb_horror %>%
  ggplot( aes(x=primaryGenre, y=averageRating, fill=primaryGenre)) + 
    geom_violin(width=1.0, size=0.1) +
    xlab("Genre") +
    theme(legend.position="none") +
    xlab("") +
  labs(y = "IMDb Rating",
       title = "Distribution of Ratings by Subgenre",
       subtitle = "Horror Movies Popular on IMDb",
       caption = "Source: https://datasets.imdbws.com/",
       fill = "Subgenres") +
  theme_light()

p

IMDb does not currently have a dedicated system for classifying horror movies into subgenres. Therefore all horror subgenres are classified as ‘Horror.’ This is annoying, but also not my fault.

Still even though it is not a perfect graph it is nice to look at!