I am going to use the Open Movie Database (OMDb), it is community built. This is a database for movies and TV shows. They make movie recommendations and then do analyses on the ratings of movies and TV shows over time.

I wanted to use this data set because, I personally am a big movie head. I love to watch new movies and give my own opinions on movies. My favorite Director is Christopher Nolan, we has made a couple movies that are in my top 10 best movies of all time.

The first thing that I did was set up all of the libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)

library(knitr)

library(httr)

library(httr2)

library(jsonlite)
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:purrr':
## 
##     flatten
library(dplyr)

I first had to request for the API key, then had to get the API key through my email. I then had to activate the API key.

OMDB_KEY <- " http://www.omdbapi.com/?i=tt3896198&apikey=cb48aab9"

This is how you get the data for the movies. You could simply make a test to get a certain movie by filling in some of the requirements. But, we want to get multiple movies, so that is not what I am going to do.

get_movie <- function(title, year) {
  
  resp <- request("http://www.omdbapi.com/") |>
    req_url_query(
      apikey = "cb48aab9",
      t = title,
      y = year
    ) |>
    req_perform()
  
  data <- fromJSON(resp_body_string(resp))
  
  return(as.data.frame(data))
}

Next, we are entering all of the movies that can fall under the title section. Then also putting in the year that the movie released. Then I created a loop that will go through each of these movies. Then used the CSV nolan_movies.

titles <- data.frame(
  title = c(
    "Following", "Memento", "Insomnia", "Batman Begins",
    "The Prestige", "The Dark Knight", "Inception",
    "The Dark Knight Rises", "Interstellar",
    "Dunkirk", "Tenet", "Oppenheimer"
  ),
  year = c(
    1998, 2000, 2002, 2005,
    2006, 2008, 2010,
    2012, 2014, 2017,
    2020, 2023
  )
)

movies_list <- list()

for (i in 1:nrow(titles)) {
  movies_list[[i]] <- get_movie(titles$title[i], titles$year[i])
}

movies_df <- bind_rows(movies_list)



write.csv(movies_df, "nolan_movies.csv", row.names = FALSE)

Then we needed to clean some of the data, making sure that everything is numeric. Making it easier to run the data and have no future problems down the line. We don’t want R to treat our numbered data as text.

movies_df$Year <- as.numeric(movies_df$Year)
movies_df$imdbRating <- as.numeric(movies_df$imdbRating)
movies_df$imdbVotes <- as.numeric(gsub(",", "", movies_df$imdbVotes))

write.csv(movies_df, "nolan_movies.csv", row.names = FALSE)

The last thing that we did was make a graph, I wanted to see a visual of the movies and how they rated. You would have to look at the year that each of the movies came out. Then match it together and you would see the rating of the movie. It looks like the best movie based on rating was The Dark Knight. Then the top 3 was The Dark Knight, Inception and Interstellar.

movies_df <- read.csv("nolan_movies.csv")

ggplot(movies_df, aes(Year, imdbRating)) +
  geom_line()