For this assignment I will be analyzing a list of the 100 best movies, according to some metrics collected by ultimatemovierankings.com. I have always been a big fan of movives and film, and thought that running some analysis on some of the highest rated movies would be fun and interesting. This document will be answering 5 questions about the data, with relevant visuals for each question.
The data comes from ultimatemovierankings.com
Movie_Ranking <- read_html("https://www.ultimatemovierankings.com/top-250-movies/")
Movies_http <- "http://www.ultimatemovierankings.com/top-250-movies/"
Ranking_Table <- readHTMLTable(Movies_http, stringsAsFactors = FALSE)
length(Ranking_Table) # This gives us 1
## [1] 1
movies <- data.frame(Ranking_Table[[1]])
colnames(movies)
## [1] "Rank" "Movie..Year."
## [3] "Star.of.Movie" "Director.of.Movie"
## [5] "Domestic.B.O..Adjusted..mils.." "Critic.Audience.Rating"
## [7] "Oscar.Nom...Win" "UMR.Score"
# Want to get the periods out of the column names
movies <- rename(movies, "Movie Year" = "Movie..Year.")
movies <- rename(movies, "Star(s) of Movie" = "Star.of.Movie")
movies <- rename(movies, "Director of Movie" = "Director.of.Movie")
movies <- rename(movies, "Domestic Box Office Adjusted (Mils.)" = "Domestic.B.O..Adjusted..mils..")
movies <- rename(movies, "Critic Audience Rating" = "Critic.Audience.Rating")
movies <- rename(movies, "Oscar Noms/wins" = "Oscar.Nom...Win")
movies <- rename(movies, "UMR Score" = "UMR.Score")
movies <- rename(movies, "Domestic Box Office Adjusted in Mils." = "Domestic Box Office Adjusted (Mils.)")
movies <- rename(movies, "Domestic Box Office Adjusted Mils." = "Domestic Box Office Adjusted in Mils.")
colnames(movies)
## [1] "Rank" "Movie Year"
## [3] "Star(s) of Movie" "Director of Movie"
## [5] "Domestic Box Office Adjusted Mils." "Critic Audience Rating"
## [7] "Oscar Noms/wins" "UMR Score"
# Make some of the values numeric values and further cleaning
movies <-
movies %>%
mutate(`Movie Year` = as.character(`Movie Year`),
`Star(s) of Movie` = as.character(`Star(s) of Movie`),
`Director of Movie` = as.character(`Director of Movie`),
`Critic Audience Rating` = as.numeric(gsub("%", "", as.character(`Critic Audience Rating`))),
`UMR Score` = as.numeric(`UMR Score`))
I want to see if there is a correlation between movie rating and how much the movie made in the box office For this analysis I will be putting the two categories into a point graph
ggplot(data = movies, aes(x = `Critic Audience Rating`, y = `Domestic Box Office Adjusted Mils.`)) +
geom_point(color = "blue", position = "jitter") +
ggtitle("Domestic Box Office by Audience Score") +
xlab("Rating") +
ylab("Box Office")
Looking at this graph, there doesn’t seem to be any relationship between how much a movie gets at the Box Office and how high of a Rating the movie gets. It seems to be relatively random. My further Analysis for this problem would be to include other sources, or too look at more movies
How does the Critic Audience rating and UMR rating compare? I want to see if there is a correlation between the two and if one rates movies higher than the other. For this, I will be looking at another point plot
ggplot(data = movies, aes(x = `Critic Audience Rating`, y = `UMR Score`)) +
geom_point(color = "pink", position = "jitter") +
ggtitle("Critic Audience Rating vs. UMR Score") +
geom_smooth(method = lm)
This graph seems to give us some type of correlation or positive relationship between the two units of scoring that this website uses for scoring movies. We see that it seems to be as the Critic Audience Rating goes up, so does the UMR Score. To further this analysis, I would want to look at more movies. Also do this same analysis with specific genres of movies to see if there are differences there.
These are the top 100 movies, so I want to know what the bellcurve looks like for all of the critic ratings and UMR Scores. Since these are considered the best movies, based on rankings, it would be cool for meto see a bellcurve of all the Critic ratings and UMR Score and where the averages lie for the top 100 movies. I will be looking at box plots for this analysis.
ggplot(data = movies, aes(`Critic Audience Rating`)) +
geom_density(fill = "purple")
ggplot(data = movies, aes(`UMR Score`)) +
geom_density(fill = "gold")
These two bell curves are interesting to look at next to eachother. They both have an average landing around 85, however the range for the Critic Audience Rating is anywhere from the high 60’s to about 95, whereas the lowest score for the UMR score is at about an 84, and the highest score only goes up to a 96. The two have the same average, even thought one has a much greated range. Further Analysis for this would be too look at even more ranking sites and sources, such as IMDB or Metacritic.
Does UMR Score influence Oscar Moninations or Wins? Want to see if there is any correlation between what a particular films UMR Scores are and if they are nominated for Oscar’s. I will be looking at another point plot for this.
ggplot(data = movies, aes(x = `Oscar Noms/wins`, y = `UMR Score`)) +
geom_point(color = "magenta", position = "jitter") +
ggtitle("UMR Score Versus Oscar Nominations") +
xlab("Oscar Nom/win") +
ylab("UMR Score") +
geom_smooth(method = "lm")
The results from this graph seem to be inconclusive. While there seems to be potential for a slight correlation between UMR Score and Oscars Nom/win, it doesn’t seem to be very strong. My guess for why this would be is that the Oscar nominated movies and especially the winners always have high scores, so it would make the results not very strong. Further analysis here would be doing it for Oscar wins specifically, and comparing nominations and wins to other ranking and rating sites as well.
I want to further see the relationship between UMR Scores and the Critic Audience Scores, so I want to do this now by looking at a box plot between the two of all of the top 100 movies.
ggplot(data = movies, aes(x = `Critic Audience Rating`, y = `UMR Score`)) +
geom_boxplot()
This graph did not turn out how I hoped, I was trying to do it for different ranges of Critic Audience Rating, so for example movies 70-80, 80-90, and 90-95. The goal was to see if higher rated movies had less scew than the moveis, for example, rated in the low 80s for the Critic Audience Score. However, this is still a fun boxplot to look at that makes a decent summary for the entire table of data. Further Analysis would be splitting this up by genre, or by year too see the correlations there, and getting it to work for splitting the blots up by how they score in one category.