Introduction

  • For my final project I created extended Week 2 Assignment with the knowledge that we have gain throughout the semester.
  • I will be showing the ratings of all the movies Michael Bay has directed and their ratings from multiple sources and normalizing the ratings to create one final rating for the end user.

OMDB API

  • Open Source API - Just need to sign up and get a limited amount of calls to the API a day
  • Provides a lot of information on the movie including the scores from IMDB, Rotten Tomato, and Metacritics
  • Example of a API Call
## 
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
## 
##     flatten
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
##            Length Class      Mode     
## Title      1      -none-     character
## Year       1      -none-     character
## Rated      1      -none-     character
## Released   1      -none-     character
## Runtime    1      -none-     character
## Genre      1      -none-     character
## Director   1      -none-     character
## Writer     1      -none-     character
## Actors     1      -none-     character
## Plot       1      -none-     character
## Language   1      -none-     character
## Country    1      -none-     character
## Awards     1      -none-     character
## Poster     1      -none-     character
## Ratings    2      data.frame list     
## Metascore  1      -none-     character
## imdbRating 1      -none-     character
## imdbVotes  1      -none-     character
## imdbID     1      -none-     character
## Type       1      -none-     character
## DVD        1      -none-     character
## BoxOffice  1      -none-     character
## Production 1      -none-     character
## Website    1      -none-     character
## Response   1      -none-     character

RobertEbert.com

  • Another rating website without a RESTFUL API
  • Had a rated list of movies that Michael Bay directed
  • The Rating system rated movies out of 4
  • Manually entered the information into a Excel Spreadsheet and imported it into the data set

Ratings

  • The ratings are from 3 different sources
  • Each of the source has a different rating system
  • Need to somehow normalize all the different rates
##                    Source  Value
## 1 Internet Movie Database 6.9/10
## 2         Rotten Tomatoes    42%
## 3              Metacritic 41/100

Min Max Normalization

  • Simple to implement compared to using Z Score
  • Do not have the population mean and standard deviation to use the Z Score
  • Allows for bounded range
  • Returns a value from 0 to 1

\[ X' = \frac{(X - X_{min})}{X_{max} - X_{min}} \]

Results

Here is the original ratings before the ratings were normalized

Movie Internet Movie Database Rotten Tomatoes Metacritic Robertebert.com
Bad Boys 6.9/10 42% 41/100 NA
The Rock 7.4/10 66% 58/100 3.5/4.0
Armageddon 6.7/10 38% 42/100 1.0/4.0
Pearl Harbor 6.2/10 24% 44/100 1.5/4.0
Bad Boys II 6.6/10 23% 38/100 1.0/4.0
The Island 6.8/10 40% 50/100 3.0/4.0

Results

We can see that IMDB(Internet Database) scores relatively higher on all Michael Bay movies compared to the other rating sources. This should not be an issue as long as IMDB consistently rates the movies.

## Warning: Removed 1 rows containing missing values (geom_bar).

Results

Here is the ratings with the normalized rates

Movie Internet Movie Database Rotten Tomatoes Metacritic Robertebert.com
Bad Boys 0.6555556 0.4141414 0.4040404 NA
The Rock 0.7111111 0.6565657 0.5757576 0.8333333
Armageddon 0.6333333 0.3737374 0.4141414 0.0000000
Pearl Harbor 0.5777778 0.2323232 0.4343434 0.1666667
Bad Boys II 0.6222222 0.2222222 0.3737374 0.0000000
The Island 0.6444444 0.3939394 0.4949495 0.6666667

Results

With all the rates normalized and averaged we can now see a simple graph which shows which are the best Michael Bay movie based on the different sources. We can see that The Rock is the most popular Micheal Bay movie ## Challenges

  • Looking for a solution to normalize the scores into one score.
  • Was trying very hard to use the Z Score method and see if i can get the popluation mean and standard deviation
  • Shows that sometimes the easiest solutions are the better solution
  • Using R Presentation instead of Powerpoint

Conclusion

  • The Rock is the most popular Micheal Bay movie followed by Transformers and The Island.
  • There are many different ways to normalize scores
  • IMDB scores are a lot higher compared to other sources